Arab World English Journal (AWEJ) Volume 14. Number 4 December 2023                       Pp.150-168

Full Paper PDF

Oman Royal Speeches Corpus: Compilation and Analysis 

Aladdin Al Zahran1, Rafik Jamoussi, Mahmood Zayid Suwaid Albakri, Anwaar Abduallah Salim Al-Maqbali.. Iman Mohammed Ahmed Albuloshi,Ghadeer Rashid Alghefeili,  Ebtihal Juma Nasser Albadri, Hiba Khalid Muslem Almandhari, Noof Mohammed Alharrasi
1Corresponding Author:
Translation Program, Faculty of Language Studies
Sohar University, Oman


Received:08/15/2023             Accepted: 11/03/2023            Published: 12/15/2023


For many years, researchers have directed their attention primarily toward developing written corpora, with the consequence that spoken corpora have consistently remained rare compared to written ones. The laborious transcription and annotation tasks make creating and maintaining spoken corpora a challenging endeavor. This project aims to build a transcribed corpus of Oman Royal Speeches and make it available online through a custom-made concordance tool. The study also aims to test the corpus for fundamental corpus-based lexical, stylistic, and discourse-analytical implementations. Compiling the Oman Royal Speeches Corpus is meant to fill a gap by contributing to the development of Arabic spoken language corpora and make available a research tool that can facilitate corpus-based research, uses, and applications in various areas of investigation. The corpus-building process underwent a five-stage process, including data capture, data processing, concordance tool development, testing and evaluation, and online deployment. With 98,511 tokens, the resultant corpus represents a searchable archive of Royal Speeches with a built-in online concordance tool that allows multiple search types and Keyword-in-Context query result display. The corpus has been tested for various corpus-analytic uses and has been found to provide significant findings in these areas. Thus, it has the potential to function as a reliable and authentic record and source of information for researchers and specialists in various fields, as well as a research tool allowing for various applications and analyses in language-related topics.
Keywords: Arabic corpora, Arabic political discourse, corpus analysis, corpus building, corpus linguistics,
linguistic and discourse analysis, sentiment analysis, transcribed spoken corpus

Cite as:  Al Zahran, A., Jamoussi, R., Albakri, M. Z.., Al-Maqbali, A.A., Albuloshi,I.M., Alghefeili, G.R., Albadri, E.J., Almandhari, H. K.,& Alharrasi, N. M.(2023). Oman Royal Speeches Corpus: Compilation and Analysis. Arab World English
Journal, 14
(4) 150-168.


Abdul Latif, E. (2017). Arabic political discourse. In E. Benmamoun & R. Bassiouney (eds.). The Routledge Handbook of Arabic Linguistics (pp. 518-530). London and New York: Routledge.

Abercrombie, G., & Batista-Navarro, R. (2020). Sentiment and position-taking analysis of parliamentary debates: a systematic literature review. Journal of Computational Social Science3(1), 245-270.

Abu Farha, I., & Magdy, W. (2019). Mazajak: An online Arabic sentiment analyser. Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 192–198 Florence, Italy, August 1, 2019. Association for Computational Linguistics. Retrieved from

Abuzayed, A., & Al-Khalifa, H. (2021). BERT for Arabic topic modeling: An experimental study on BERTopic technique. Procedia Computer Science, 189, 191-194.

Ahrens, K. (2015). Corpus of Political Speeches. Hong Kong Baptist University Library. Retrieved from:

Alduhaim, A. (2019). A comparative study of political discourse features in English and Arabic. International Journal of English Linguistics9(6), 148-159.

Alhaj, F., Al-Haj, A., Sharieh, A., & Jabri, R. (2022). Improving Arabic cognitive distortion classification in Twitter using BERTopic. International Journal of Advanced Computer Science and Applications13(1), 854-860.

Alrabiah, M. (2013). KSUCCA: King Saud University Corpus of Classical Arabic. Sketch Engine. Retrieved from:

Al-Sharoufi, H. (2006). Critical discourse analysis of political editorials in some Arabic newspapers. 11th Proceedings of the Pan Pacific Association of Applied Linguistics, 8-27.

Al-Sowaidi, B., Banda, F., & Mansour, A. (2017). Doing politics in the recent Arab uprisings: towards a political discourse analysis of the Arab Spring slogans. Journal of Asian and African Studies52(5), 621-645.

Anthony, L. (2023). AntConc (Version 4.2.4) [Computer Software]. Tokyo, Japan: Waseda University. Available from

Arifianto, M. L. (2021). Utilizing the Quranic Arabic Corpus as a supplementary teaching and learning material for Arabic syntax: An overview of a web-based Arabic linguistics corpus. International Seminar on Language, Education, and Culture, KnE Social Sciences, 403–412. DOI 10.18502/kss.v5i3.8563

Baker, P. (Ed.). (2012). Contemporary Corpus Linguistics. Continuum.

Bassiouney, R. (2020). Arabic Sociolinguistics: Topics in Diglossia, Gender, Identity, and Politics (2nded.). Edinburgh: Edinburgh University Press.

Bibliotheca Alexandrina. (2013). International corpus of Arabic. Bibliotheca Alexandrina. Retrieved from:

Bing, L. (2012). Sentiment analysis and opinion mining (synthesis lectures on human language technologies). Chicago: University of Illinois.

Bonelli, E. T. (2010). Theoretical overview of the evolution of corpus linguistics. In A. O’Keeffe & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (pp. 14-27). London and New York: Routledge.

Bordoloi, M., & Biswas, S. K. (2023). Sentiment analysis: A survey on design framework, applications and future scopes. Artificial Intelligence Review 56, 12505–12560,

Bowker, L., & Pearson, J. (2002). Working with Specialized Language: A Practical Guide to Using Corpora. London and New York: Routledge.

Boyd-Graber, J., Hu, Y., & Mimno, D. (2017). Applications of topic models. Foundations and Trends® in Information Retrieval11(2-3), 143-296.

Caetano, J. A., Lima, H. S., Santos, M. F., & Marques-Neto, H. T. (2018). Using sentiment analysis to define Twitter political users’ classes and their homophily during the 2016 American presidential election. Journal of Internet Services and Applications9(1), 1-15.

Crosthwaite, P., Ningrum, S., & Schweinberger, M. (2022). Research trends in corpus linguistics: A bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities. International Journal of Corpus Linguistics, 28(3), 344-377,

Curran, B., Higham, K., Ortiz, E., & Vasques Filho, D. (2018). Look who’s talking: Two-mode networks as representations of a topic model of New Zealand parliamentary speeches. PloSONE13(6), e0199072.

Davies, M. (2008). The corpus of contemporary American English (COCA): 600 million words, 1990-present. English Retrieved from:

Dipper, S. (2008). Theory-driven and corpus-driven computational linguistics, and the use of corpora. In A. Ludeling & M. Kyto (eds.), Corpus Linguistics: An International Handbook (Vol. 1, pp. 68–96). Berlin: de Gruyter.

Dukes, K. (2017). The Quranic Arabic Corpus. Retrieved from:

Ebrahimi, M., Yazdavar, A. H., & Sheth, A. (2017). Challenges of sentiment analysis for dynamic events. IEEE Intelligent Systems32(5), 70-75.

Egbert, J., Biber, D., & Gray, B. (2022). Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness. Cambridge: Cambridge University Press.

El-Haj, M. (2013). KALIMAT a multipurpose Arabic corpus. Retrieved from:

El-Khair, I. A. (2016). 1.5 billion words Arabic corpus. arXiv preprint arXiv:1611.04033.

European Parliament Interpretation Corpus (EPIC) (Version 1). (2014). ELRA. Retrieved from:

Golfetto, M. A., Osti, L., & Chakrani, B. (2021). Which Arabic and why? Policies, politics, and teaching. LINGUE CULTURE MEDIAZIONI8(2), 5-12.

Haselmayer, M., & Jenny, M. (2017). Sentiment analysis of political communication: Combining a dictionary approach with crowdcoding. Quality & Quantity51, 2623-2646.

Jagannathan, M., Roy, D., & Delhi, V. S. K. (2022). Application of NLP-based topic modeling to analyse unstructured text data in annual reports of construction contracting companies. CSI Transactions on ICT10(2), 97-106.

King Abdulaziz City for Science and Technology. (2020). King Abdulaziz City for Science and Technology (KACST) Arabic Corpus. Retrieved from:

Knight, D., & Adolphs, S. (2022). Building a spoken corpus: what are the basics? In A. O’Keeffe & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (2nded., pp. 21-34). London and New York: Routledge.

Koester, A. (2022). Building small specialised corpora. In A. O’Keeffe, & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (2nded., pp. 48-61). London and New York: Routledge.

Lindstedt, N. C. (2019). Structural topic modeling for social scientists: A brief case study with social movement studies literature, 2005–2017. Social Currents6(4), 307-318.

Maalej, Z. A. (2012). The ‘Jasmine Revolt’ has made the ‘Arab Spring’: A critical discourse analysis of the last three political speeches of the ousted president of Tunisia. Discourse & Society23(6), 679-700.

Malmkjaer, K. (1998). Love thy neighbour: Will parallel corpora endear linguists to translators? Meta, 43(4), 534–541.

Mansour, M. A. (2013). The absence of Arabic corpus linguistics: A call for creating an Arabic national corpus. International Journal of Humanities and Social Science, 3(12), 81-90.

Matalon, Y., Magdaci, O., Almozlino, A., & Yamin, D. (2021). Using sentiment analysis to predict opinion inversion in Tweets of political communication. Scientific Reports11(1), 1-9.

Mautner, G. (2022). What can a corpus tell us about discourse? In A. O’Keeffe & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (2nded., pp. 250-262). London and New York: Routledge.

May, C. C. (2022). Topic modeling in theory and practice (Unpublished doctoral dissertation). Johns Hopkins University, Baltimore, Maryland.

McCarthy, M., & O’Keeffe, A. (2010). Historical perspective: What are corpora and how have they evolved?”. In A. O’Keeffe & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (pp. 3-13). London and New York: Routledge.

McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.

McEnery, T., Hardie, A., & Younis, N. (Eds.). (2019). Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press.

O’Keeffe, A., & McCarthy, M. J. (Eds.). (2022a). The Routledge Handbook of Corpus Linguistics (2nded.). London and New York: Routledge.

O’Keeffe, A., & McCarthy, M. J. (2022b). ‘Of what is past, or passing, or to come 1’: corpus linguistics, changes and challenges. In A. O’Keeffe & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (pp. 1-9). London and New York: Routledge.

Reppen, R. (2022). Building a corpus: What are the key considerations? In A. O’Keeffe & M. J. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics (2nded., pp. 13-20). London and New York: Routledge.

Rouhana, T. (2023). Critical discourse analysis guided topic modeling: the case of Al-Jazeera Arabic. Information, Communication & Society26(5), 904-922.

The British National Corpus, version 3, BNC XML Ed. (2007). Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. Retrieved from

Xiao, R. (2010). Corpus creation. In N. Indurkhya & F. Damerau (eds.), The Handbook of Natural Language Processing (2nd ed., pp. 147-165). New York: Chapman and Hall/CRC.

Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Singapore: Springer.

Zygomatic. (2023). Retrieved January 26, 2023, from



Received: 08/15/2023
Accepted: 11/03/2023
Published: 12/15/2023  

Aladdin Al Zahran serves as an Assistant Professor of Interpreting & Translation Studies at Sohar University in Oman. His primary research focus lies in corpus-based interpreting/translation studies. Currently, Dr. Al Zahran is managing two projects: one on using semantic leads to improve corpus building & terminology extraction, and another centered on creating a multimodal corpus of simultaneously interpreted, translated and original speeches. ORCID:

Rafik Jamoussi is an Associate Professor of Translation Studies at Sohar University, Oman. He has taught courses addressing terminology management and technology in translation as well as courses on literary and legal translation. His research interests include translator training, translation technology, and corpus linguistics. ORCID: