What do we get from extracting collocations?

Linguistic analysis of automatically obtained Russian MWEs

Authors

  • Daria Kormacheva University of Helsinki

DOI:

https://doi.org/10.1558/jrds.v1i2.26946

Keywords:

multiword expressions, corpus linguistics, automatic collocation extraction, semantic analysis, Russian language

Abstract

This paper applies linguistic analysis to the results from the automatic extraction of multiword expressions in order to understand whether they are reliable from the theoretical point of view. The nature of the extracted units is discussed and illustrated with examples of Russian prepositions: first classified according to I. Mel’?uk’s theory (1995) and then re-analysed using the notion of constructions. The corpus-driven approach reveals the shortcomings in the prevalent way of describing multiword expressions in terms of strict classes, and the present paper can be thought of as providing a theoretical basis for the development of a new approach to their description.

Author Biography

  • Daria Kormacheva, University of Helsinki

    Daria Kormacheva is a PhD student at the University of Helsinki, Finland. She is a member of the ‘Collocations, Colligations and Corpora’ research group and her work aims to bridge the gap between linguistic theory and computational methods used to describe linguistic phenomena. In her research, she applies traditional linguistic analysis to the results of the automatic collocation extraction in order to understand whether they can be accurately described by a theoretical model.

References

Baker, P., Hardie, A. and McEnery, T. (2006). A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.

Calzolari, N., Fillmore, C. J., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Canary Islands – Spain. European Language Resources Association (ELRA).

Church, K. W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–29.

Church, K. W., Gale, W., Hanks, P. and Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, 115–164. Hillsdale, NJ: Lawrence Erlbaum.

?ermák, F. (2001). Substance of idioms: Perennial problems, lack of data or theory? International Journal of Lexicography 14 (1): 1–20. http://dx.doi.org/10.1093/ijl/14.1.1

Daudaravicius, V. (2010). Automatic identification of lexical units. Informatica (03505596) 34 (1).

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19 (1): 61–74.

Gries, S. Th. (2013). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics 18 (1): 137–166. http://dx.doi.org/10.1075/ijcl.18.1.09gri

Fillmore, Ch. J., Kay, P. and O'Connor, M. C. (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language 64 (3): 501–538. http://dx.doi.org/10.2307/414531

Frank, S. L., Bod, R. and Christiansen, M. H. (2012). How hierarchical is language use? Proceedings of the Royal Society B: Biological Sciences 279 (1747): 4522–4531. http://dx.doi.org/10.1098/rspb.2012.1741

Iordanskaja, L. and Mel’?uk, I. (2007). Smysl i sochetaemost’ v slovare. Moskva: Jazyki slavjanskih kul’tur. [In Russian]

Jackendoff, R. (1997). The Architecture of the Language Faculty. No. 28. Cambridge, MS: MIT Press.

Kormacheva, D., Pivovarova, L. and Kopotev, M. (2014). Automatic collocation extraction and classification of automatically obtained bigrams. Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014): 27–33.

Levontina, I. (1995). Slovarnye stat’i predlogov DLJA i RADI: k probleme leksikograficheskoj interpretacii mnogoznachnosti u služebnyh slov. Teoreticheskaja lingvistika i leksikografija: opyty sistemnogo opisanija leksiki. [In Russian]

Manning, Ch. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

Mel’cuk, I. and Žolkovsky, A. (1984). Explanatory Combinatorial Dictionary of Modern Russian. Wiener Slawistischer Almanach. Inst. für Slawistik d. Univ. Wien.

Mel’?uk, I. (1995). Phrasemes in language and phraseology in linguistics. Idioms: Structural and Psychological Perspectives: 167–232.

Mel’?uk, I. (1998). Collocations and lexical functions. In A. P. Cowie (ed.) Phraseology. Theory, Analysis, and Applications, 23–53. Oxford: Clarendon Press.

Mel’?uk, I. (2006). Explanatory combinatorial dictionary. Open Problems in Linguistics and Lexicography: 225–355.

Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford: Clarendon Press.

Nunberg, G., Sag, I. A. and Wasow, T. (1994). Idioms. Language 70 (3): 491–538. http://dx.doi.org/10.1353/lan.1994.0007

Rogožnikova, R. (2003). Tolkovyj slovar’ sochetanij, ekvivalentnih slovu. Moskva: Astrel’. [In Russian]

Sag, I. A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. Computational Linguistics and Intelligent Text Processing, 1–15. Berlin and Heidelberg: Springer. http://dx.doi.org/10.1007/3-540-45715-1_1

Sinclair, J. (1991). Corpus, Concordance, Collocation. Vol. 1. Oxford: Oxford University Press.

Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell Publishers.

Swinney, D. A. and Cutler, A. (1979). The access and processing of idiomatic expressions. Journal of Verbal Learning and Verbal Behavior 18 (5): 523–534. http://dx.doi.org/10.1016/S0022-5371(79)90284-6

Published

2015-07-24

Issue

Section

Articles

How to Cite

Kormacheva, D. (2015). What do we get from extracting collocations? Linguistic analysis of automatically obtained Russian MWEs. Journal of Research Design and Statistics in Linguistics and Communication Science, 1(2), 169-189. https://doi.org/10.1558/jrds.v1i2.26946

Most read articles by the same author(s)

1 2 3 4 5 6 > >>