Linguistics and the Human Sciences, Vol 6, No 1-3 (2010)

Annotating thematic features in English and Spanish: A contrastive corpus-based study

Jorge Arús, Julia Lavid, Lara Moratón
Issued Date: 13 Dec 2012


In this paper we present the preliminary results of an empirical study designed to test contrastive features of the category of Theme in English and Spanish through corpus analysis and manual annotation. Using as our theoretical basis the more general features of the model of thematisation proposed in Lavid, Arús and Zamorano (2010), the study describes the different steps of the methodology used, starting with the selection of the corpus used as a ‘training suite’, followed by the design of the annotation scheme, and ending with a discussion of the results of two annotation experiments carried out so far to test the reproducibility of the annotation scheme. It is expected that the work reported in this paper has a theoretical impact on the area of contrastive corpus studies and serves as the basis for the (semi)-automatic annotation of thematic features in larger bilingual corpora.

Download Media

PDF (Price: £17.50 )

DOI: 10.1558/lhs.v6i1-3.173


Arnaiz, A. R. (1997) An overview of the main word order characteristics of Romance. In A. Siewierska (ed.) Constituent Order in the Languages of Europe, 47‒73. Berlin: Mouton de Gruyter.
Arús, J. (2010) On Theme in English and Spanish: A comparative study. In E. Swain (ed.) Thresholds and Potentialities of Systemic Functional Linguistics: Multilingual, Multimodal and Other Specialised Discourses, 23‒48. Trieste: EUT.
Arús, J. (2007) On the aboutness of Theme. In M. Losada, P. Ron, S. Hernández and J. Casanova (eds) Proceedings of the 30th International AEDEAN Conference (CD-ROM).
Berry, M. (1989) Thematic options and success in writing. In C. Butler, R. Cardwell and J. Cardwell (eds) Language and Literature: Theory and Practice. A Tribute to Walter Grauberg, 62‒80. Nottingham: University of Nottingham.
Fawcett, R. (2007) The many types of ‘Theme’ in English: their semantic systems and their functional syntax. Retrieved on 10 June 2010 from
Halliday, M. A. K. and Matthiessen, C. M. I. M. (2004) Introduction to Functional Grammar. London: Arnold.
Hausser, R. (2001) Foundations of Computational Linguistics. Berlin: Springer.
Krippendorff, K. (2007) Computing Krippendorff’s Alpha-Reliability. Retrieved on 21 March 2010 from
Lavid, J. (2010) Contrasting choices in clause-initial position in English and Spanish: A corpus-based analysis. In E. Swain (ed.) Thresholds and Potentialities of Systemic Functional Linguistics: Multilingual, Multimodal and Other Specialised Discourses, 49‒68. Trieste: EUT.
Lavid, J. (2000a) Contextual constraints on thematisation in written discourse: an empirical study. In P. Bonzon, M. Cavalcanti and R. Nossum (eds) Formal Aspects of Context, 37‒47. Dordrecht/Boston/London: Kluwer Academic Publishers.
Lavid, J. (2000b) Text types, chaining strategies and Theme in a multilingual corpus: A cross-linguistic comparison for text generation. In J. Bregazzi, A. Downing, D. López and J. Neff (eds) Estudios de Filología Inglesa: Homenaje a Jack White 107‒121. Madrid: Editorial Complutense.
Lavid, J. (1998) The relevance of corpus-based research for contrastive linguistics and computational studies: Thematisation as an example. In M. T. Turell and E. Vallduví (eds). IV i V Jornades de corpus lingüistics (1996–1997): els corpus en la recerca semàntica i pragmàtica, 117‒140. Barcelona: Publicaciones del Instituto Universitario de Lingüística Aplicada, Universidad Pompeu Fabra.
Lavid, J., Arús, J. and Zamorano, J. R. (2010a) Systemic-Functional Grammar of Spanish: A Contrastive Account with English. London: Continuum.
Lavid, J., Arús, J. and Moratón, L. (2010b) Signalling genre through Theme: The case of news reports and commentaries. In L-M. Ho-Dac (ed.) Proceedings of the 8th MAD: Signalling Text Organisation, 82‒92. Moissac (France): University of Toulousse. Available at
Leech, Geoffrey (1997) Introducing corpus annotation. In R. Garside, G. Leech and A. McEnery (eds) Corpus Annotation: Linguistic Information from Computer Text Corpora, 1‒19. London: Longman.
Matthiessen, C. M. I. M. (1995) Lexicogrammatical Cartography: English Systems. Tokyo: International Language Science Publishers.
Matthiessen, Christian (2006) Frequency Profiles of some Basic Grammar Systems. In G. Thomson and S. Hunston (eds) System and Corpus: Exploring Connections 103–42. London: Equinox.
McCabe, A. and Alonso, I. (2001) Theme, transitivity and cognitive representation in Spanish and English written texts. In CLAC 7/2001. Retrieved on 10 February 2009 from
O’Donnell, M. (2010) UAM Corpus Tool. Available at
Ravelli, L. J. (1995) A dynamic perspective: implications for metafunctional interaction and an understanding of Theme. In R. Hasan and P. H. Fries (eds) On Subject and Theme, 187‒234. Amsterdam and Philadelphia, PA: Benjamins.
Rose, D. (2001) Some variation in Theme across languages. Functions of Language 8 (1): 109‒145.
Taboada, M. (1995) Theme Markedness in English and Spanish: A Systemic-Functional Approach. Retrieved on 24 September 2010 from
University of Pittsburg (2010) UCAT Coding Tool. Available at http://cat.ucsur.pi


  • There are currently no refbacks.

Equinox Publishing Ltd - 415 The Workstation 15 Paternoster Row, Sheffield, S1 2BX United Kingdom
Telephone: +44 (0)114 221-0285 - Email:

Privacy Policy