Arab World English Journal (AWEJ) Volume 11. Number1 March 2020                                                    Pp. 195-211

Full Paper PDF


The Detective and Sensation Fiction of Wilkie Collins:
A Computational Lexical-Semantic Analysis

Abdulfattah Omar
Department of English, College of Science and Humanities
Prince Sattam Bin Abdulaziz University
Al-Kharj 11942, Kingdom of Saudi Arabia
Department of English, Faculty of Arts, Port Said University

Theme and genre classifications in the works of Wilkie Collins (1824-89) have been extensively investigated using different literary approaches; these are usually based on textual content and biographical considerations. Different critics place Collins’ works under the two main headings of detective fiction and sensation fiction. Such analyses have been generated by what is referred to as the ‘philological method’; that is, by an individual critic’s reading of the relevant material and their intuitive abstraction of generalizations from that reading. A problem with such an approach is that it is not objective, and it is therefore unreliable. The research question is thus asked in response to the subjectivity of previous genre classifications of the novels of Wilkie Collins and the lack of agreement among literary critics and researchers about such classifications. As such, I ask whether an objective and conceptually useful reading of the themes and subjects of Wilkie Collins’ prose fiction texts can be developed. As thus, computational lexical-semantics is suggested to understand the issues of thematic classification. For this purpose, vector space clustering (VSC) was used for capturing the lexical-semantic features of his novels and linking them explicitly to the relevant themes and genres. It is suggested that through this method, an objective, replicable, and reliable genre classification of Collins’ novels is possible. The results of this study can serve as a basis for future studies and criticisms of Wilkie Collins’ fiction.

Keywords: computational lexical-semantics; detective fiction; genre classification; sensation fiction; theme analysis; vector space clustering (VSC); Wilkie Collins

Cite as: Omar, A. (2020). The Detective and Sensation Fiction of Wilkie Collins: A Computational Lexical-Semantic Analysis. Arab World English Journal, 11 (1) 195-211.


Amati, G., & van Rijsbergen, C. J. (2002). Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems, 20(4), 357-389.

Argamon, S., & Olsen, M. (2006). Toward meaningful compting Commun ACM, 49(4), 33-35.

Bakhtin, M. (1981). Discourse in the Novel The Dialogic Imagination: Four Essays (pp. 259-422 ). Austin: University of Texas Press.

Berry, D. M. (ed.) (2012). Understanding Digital Humanities. Basingstoke: Palgrave Macmillan.

Clark, S. (2015). Vector Space Models of Lexical Meaning. In S. Lappin & C. Fox (Eds.), The Handbook of Contemporary Semantic Theory (pp. 493-522): Wiley-Blackwell.

Corns, T. N. (1991). Computers in the Humanities: Methods and Applications in the Study of English Literature. Literary and Linguistic Computing, 2 (2), 127-130. DOI:10.1093/llc/2.2.127

Elson, D. K., Dames, N., & McKeown, K. R. (2010). Extracting Social Networks from Literary Fiction. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.

Finneran, R. J. (ed). (1996). The Literary Text in the Digital Age: Ann Arbor: University of Michigan Press.

Gasson, A., & Peters, C. (1998). Wilkie Collins: An Illustrated Guide. Oxford: Oxford University Press.

Härdle, W., & Simar, L. (2003). Applied multivariate statistical analysis. Berlin; New York: Springer.

Hockey, S. M. (2000). Electronic Texts in the Humanities: Principles and Practice. Oxford: Oxford University Press.

Horton, T., Taylor, C., Yu, B. & Xiang, X. (2006). ‘Quite Right, Dear and Interesting’: Seeking the Sentimental in Nineteenth Century American Fiction. Paris-Sorbonne: Digital Humanities

Jackson, J. E. (1991). A User’s Guide to Principal Components. New York: John Wiley & Sons.

Jacobs, H. A. (1861). Incidents in the Life of a Slave Girl. Boston: Jacobs.

Jayannavar, P. A., Agarwal, A., Ju, M., & Rambow, O. (2015). Validating Literary Theories Using Automatic Social Network Extraction. Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, 32–41.

Jockers, M. L. (2009) ‘Machine-Classifying Novels and Plays by Genre,’ Matthew L. Jockers blog, 13 February. Retrieved from:

Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Berlin ; Springer Verlag.

Mangham, A. (2008). Wilkie Collins: Interdisciplinary Essays: Cambridge Scholars Pub.

Moisl, H. (2009). Using Electronic Corpora in Historical Dialectology Research: The Problem of Document Length Variation. In M. Dossena & R. Lass (Eds.), Studies in English and European Historical Dialectology (98, pp. 67-90). Bern: Peter Lang

Moretti, F. (2011). Network Theory, Plot Analysis. New Left Review, 68(March-April), 80-102.

Novoviˇcov ́a, J., Mal ́ık, A., & Pudil, P.(2004). Feature Selection Using Improved Mutual Information for Text Classification. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 1010-1017). Springer.

Page, N. (2002). Wilkie Collins: The Critical Heritage: London: Routledge.

Plaisant, C., Rose, J., Yu, B., Auvil, L., Kirschenbaum, M. G., Smith, M. N., Clement, T., & Lord, G. (2006). Exploring Erotics in Emily Dickinson’s Correspondence with Text Mining and Visual Interfaces. Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, 141–150. ACM.

Potter, R. G. (1989). Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric. Philadelphia:  University of Pennsylvania Press, Incorporated.

P Propp, V. (1968). Morphology of the Folktale.  Austin: University of Texas Press.

Putejovsky, J. (2012). The Semantics of Functional Spaces. In A. C. Schalley (ed.), Practical Theories and Empirical Practice: A Linguistic Perspective (pp. 307-324). John Benjamins Publishing

Pykett, L. (2005). Wilkie Collins. Oxford: Oxford University Press.

Ramsay, S. (2003). Special Section: Reconceiving Text Analysis: Toward an Algorithmic Criticism. Lit Linguist Computing, 18(2), 167-174. DOI:10.1093/llc/18.2.167

Ramsay, S. (2005). In Praise of Pattern. TEXT Technology: the Journal of Computer Text Processing, 14(2), 177-190.

Ramsay, S. (2007). Algorithmic Criticism. In R. G. Siemens & S. Schreibman (Eds.), A companion to digital literary studies. Malden, MA: Blackwell Publishers. Oxford: Blackwall

Rencher, A. C. (2002). Methods of Multivariate Analysis (2nd ed.): John Wiley & Sons, INC.

Rettberg, S. (2016). Electronic Literature as Digital Humanities. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.), A New Companion to Digital Humanities (1st ed., pp. 127-137). West Sussex (UK): Wiley Blackwell.

Rijsbergen, C. J. V. (1979). Information Retrieval (2nd ed.). London: Butterworth.

Robertson, S. E., & Walker, S. (1994). Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. SIGIR ’94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 345–354. Springer-Verlag, Retrieved from

Rockwell, G. (2003). What is Text Analysis, Really? Literary and Linguistic Computing, 18(2), 209-219. DOI:10.1093/llc/18.2.209

Rommel, T. (2004). Literary Studies. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.), A Companion to Digital Humanities (pp. 88-97). Oxford: Blackwell.

Rowson, S. (1794). Charlotte. A Tale of Truth. Philadelphia: Printed by D. Humphreys, for M. Carey.

Rowson, S. (1828). Charlotte’s Daughter; or, The Three Orphans. A Sequel to Charlotte Temple. Boston: Richardson & Lord.

Sahlgren, M. (2006). The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. (Unpublished Doctoral Thesis). Stockholm University, Stockholm, Sweden

Saint-Dizier, P., Viegas, E., issues, L. s., Bird, S., Boguraev, B., & HIndle, D. (1995). Computational Lexical Semantics: Cambridge University Press.

Salton, G., & Buckley, C. (1987). Term Weighting Approaches in Automatic Text Retrieval. Retrieved from

Siemens, R., & Schreibman, S. (2013). A Companion to Digital Literary Studies. West Sussex: Blackwell Wiley.

Singhal, A., Chris, B., & Mandar, M. (1996). Pivoted Document Length Normalization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 21-29. DOI:

Singhal, A., Salton, G., Mitra, M., & Buckley, C. (1996). Document length normalization. Information Processing & Management, 32(5), 619-633.

Storjohann, P. (ed). (2010). Lexical-Semantic Relations: Theoretical and practical perspectives: John Benjamins Publishing Company.

Stowe, H. B. (1859). The Minister’s Wooing. New York: Derby and Jackson.

Stowe, H. B. (1897). Uncle Tom’s Cabin. New York: T. Y. Crowell & company.

Taylor, J. B. (2006). The Cambridge Companion to Wilkie Collins. Cambridge: Cambridge University Press.

Yu, B. (2008). An Evaluation of Text Classification Methods for Literary Study. Literary and Linguistic Computing, 23(3), 327-343.


Abdulfattah Omar is an Associate Professor of linguistics at Prince Sattam Bin Abdulaziz
University. He finished his PhD in linguistics at Newcastle University in 2010. His research
interests include computational linguistics, digital humanities, and literary computing.