Sociolinguistic Studies, Estudios de Sociolingüística 4.1 2003

The Corpus of Galicia / Spanish Bilingual Speech of the University of Vigo: Codes tagging and automatic anotation

Xoán Paulo Rodríguez-Yáñez, Hakan Casares-Berg
Issued Date: 19 Mar 2007


Firstly, we present a brief explanation of this research project, the Corpus of Galician/Spanish Bilingual Speech (Corpus de Fala Bilingüe Galego/Castelán, abbreviated as CoFaBil), currently being complied at the University of Vigo. This ethnographicconversational based corpus has been recorded in a wide range of informal and spontaneous communicative situations, subsequently transcribed in detail with those conventions normally applied to conversation analysis. Secondly, we explain the manual annotation process of the corpus. The CHAT annotation system, applied in tagging this corpus, requires specifying the linguistic-communicative code to which each word belongs. So, we shall explain the problems to which this word by word tagging leads us. These problems cover phenomena characteristic of both bilingual conversation and languages in contact, but with the specificity that the scarce interlinguistic distance between the varieties of Galician and of Spanish call for adopting certain tagging values (presented in the text) that respond to the complex nature of the different phenomena detected. Thirdly, we present the solutions conceived for the automatic annotation of this corpus. The most important result is the computer application Anotador 1.0, which makes it possible to note down a substantial part of the phenomena appearing in the CoFaBil more speedily, while doing away with the interpretative biases involved in human annotating. Also, due to the versatility of this tool, it may be used as a corpora annotator of bilingual speech for any pair of languages.

Download Media

PDF (Price: £17.50 )

DOI: 10.1558/sols.v4i1.358


  • There are currently no refbacks.

Equinox Publishing Ltd - 415 The Workstation 15 Paternoster Row, Sheffield, S1 2BX United Kingdom
Telephone: +44 (0)114 221-0285 - Email:

Privacy Policy