Possible measures of asymmetry and redundancy in collocations

Authors

  • Robert Nelson University of Alabama

DOI:

https://doi.org/10.1558/jrds.v1i2.20304

Keywords:

collocation, information theory, asymmetry, corpus linguistics

Abstract

It has long been recognized that developing measures of the internal structure of collocations is an important goal (Sinclair, 1991). Recently, Gries’ (2013) presented a measure that captures the asymmetric nature of conditional probabilities in collocations. This paper intends to contribute to the discussion by introducing measures of asymmetry and redundancy that may meet the needs of some researchers. Two asymmetry measures are described. The first captures only frequency asymmetry while the second is an asymmetric version of the mutual information measure. A measure of semantic redundancy is also described here. This measure takes a higher value when the fact that two words co occur contains more information than the uncertainty introduced by the occurrence of the individual words.

Author Biography

  • Robert Nelson, University of Alabama

    Robert Nelson received his PhD in ESL and Linguistics from Purdue University in 2008. Before joining the faculty at UA in 2009, he was an Assistant Professor at Murray State University. His present research focus involves neural network models of second language and bilingual lexical memory and phonological percpetion. He is also interested in Bayesian models and methods.

References

Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing in Python. Sebastopol, CA: O’Reilly Media.

Bybee, J. L. (2010). Language, Usage and Cognition (Vol. 98). Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511750526

Dirven, R. and Verspoor, M. (Eds) (2004). Cognitive Exploration of Language and Linguistics (Vol. 1). New York: John Benjamins Publishing. http://dx.doi.org/10.1075/clip.1

Ellis, N. C. (2006). Language acquisition as rational contingency learning. Applied Linguistics, 27 (1): 1–24. http://dx.doi.org/10.1093/applin/ami038

Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In Aijmer, Karin (Ed.), Corpora and Language Teaching, 13–332. New York: John Benjamins. http://dx.doi.org/10.1075/scl.33.04gra

Gries, S. T. (2010). Useful statistics for corpus linguistics. In Aquilino Sánchez and Moisés Almela (Eds) A Mosaic of Corpus Linguistics: Selected Approaches, 269–291. Frankfurt am Main: Peter Lang.

Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics, 18 (1): 137–166. http://dx.doi.org/10.1075/ijcl.18.1.09gri

Justeson, J. S. and Katz, S. M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1 (1): 9–27. http://dx.doi.org/10.1017/S1351324900000048

Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22 (1): 79–86. http://dx.doi.org/10.1214/aoms/1177729694

Liu, D. (2013). Salience and construal in the use of synonymy: A study of two sets of near-synonymous nouns. Cognitive Linguistics, 24 (1): 67–113. http://dx.doi.org/10.1515/cog-2013-0003

Michelbacher, L., Evert, S. and Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7 (2): 245–276. http://dx.doi.org/10.1515/cllt.2011.012

Ramscar, M., Dye, M. and McCauley, S. M. (2013). Error and Expectation in language learning: The curious absence of mouses in adult speech. Language, 89 (4): 760–793. http://dx.doi.org/10.1353/lan.2013.0068

Renouf, A. and Banerjee, J. (2007). Lexical repulsion between sense-related pairs. International Journal of Corpus Linguistics, 12 (3): 415–444. http://dx.doi.org/10.1075/ijcl.12.3.05ren

Rescorla, R. A. and Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasky Classical conditioning II: Current Research and Theory, 64–99. New York: Appleton-Century-Crofts.

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27 (3): 379–423.

Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press. http://dx.doi.org/10.1002/j.1538-7305.1948.tb01338.x

Spivey, M. J. and Richardson, D. C. (2008). Language embedded in the environment. In P. Robbins and M. Aydede (Eds) The Cambridge Handbook of Situated Cognition, 382-400. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511816826.020

Theil, H. (1970). On the estimation of relationships involving qualitative variables. American Journal of Sociology, 76 (1): 341–357. http://dx.doi.org/10.1086/224909

Watanabe S (1960). Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development, 4 (1): 66–82. http://dx.doi.org/10.1147/rd.41.0066

Wolfram, S. (2014). Launching Mathematica 10 – with 700+ New Functions and a Crazy Amount of R&D. http://blog.wolfram.com/2014/07/09/launching-mathematica-10-with-700-new-functions-and-a-crazy-amount-of-rd

Published

2015-07-24

Issue

Section

Articles

How to Cite

Nelson, R. (2015). Possible measures of asymmetry and redundancy in collocations. Journal of Research Design and Statistics in Linguistics and Communication Science, 1(2), 191-212. https://doi.org/10.1558/jrds.v1i2.20304