Measuring and interpreting lexical dispersion in corpus linguistics

Authors

  • Brent Burch Northern Arizona University, Flagstaff, AZ, USA.
  • Jesse Egbert
  • Douglas Biber

DOI:

https://doi.org/10.1558/jrds.33066

Keywords:

Gries’ DPnorm, Juilland’s D, Word frequency lists

Abstract

The frequency of occurrence and the dispersion of a word are measures of a word’s importance in a collection of texts or a corpus. In particular, lexical dispersion is a statistic in corpus linguistics that measures a word’s homogeneity across the parts of a corpus. There are different ways to measure dispersion and the authors compare three approaches. Both formulaic and interpretative issues pertaining to dispersion are discussed in terms of the frequency of a word in the corpus parts and the variability of a word across the corpus. A simulation study and an application involving words from the British National Corpus indicate that the index constructed from the difference between every possible pair of frequencies of the word in the parts of a corpus is preferred.

References

Biber, D., Reppen, R., Schnur, E., & Ghanem, R. (2016). On the (non) utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics, 21(4), 439–464.

Brezina, V. and Gablasova, D. (2015). Is there a core general vocabulary?: Introducing the new general service list. Applied Linguistics, 36 (1), 1–22. https://doi.org/10.1093/applin/amt018

Carroll, J. B. (1970). An Alternative to Juilland’s Usage Coefficient for Lexical Frequencies. ETS Research Bulletin Series, 1970: i–15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x

Carroll, J. B. (1970). An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behavior, 3 (2), 61–65.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34 (2), 213–238. https://doi.org/10.2307/3587951

Davies, M. and Gardner, D. (2010). A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates, and Thematic Lists. London: Routledge.

Gardner, D. and Davies, M. (2013). A new academic vocabulary list. Applied Linguistics Advanced Access: https://doi.org/10.1093/applin/amt015. First published online: 2 August 2013.

Gries, St. Th. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13 (4), 403–437. https://doi.org/10.1075/ijcl.13.4.02gri

Gries, St. Th. (2010). Dispersions and adjusted frequencies in corpora: Further explorations. In St. Th. Gries, S. Wulff, and M. Davies (Eds), Corpus Linguistic Applications: Current Studies, New Directions, 197–212. Amsterdam: Rodopi. https://doi.org/10.1163/9789042028012_014

Gries, St. Th. and Lijffijt, J. (2012). Correction to ‘Dispersions and adjusted frequencies in corpora’. International Journal of Corpus Linguistics, 17 (1), 147–149. https://doi.org/10.1075/ijcl.17.1.08lij

Juilland, A. G. and Chang-Rodriguez, E. (1964). Frequency Dictionary of Spanish Words. The Hague: Mouton & Co.

Juilland, A. G., Brodin, D. R. and Davidovitch, C. (1970). Frequency Dictionary of French Words. The Hague: Mouton de Gruyter.

Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. London: Longman.

Stuart, A. and Ord, K. (1994). Kendall’s Advanced Theory of Statistics, Volume 1: Distribution Theory, sixth edition. London: Arnold.

Wilcox, A. R. (1967). Indices of Qualitative Variation, Oak Ridge, TN: Oak Ridge National Laboratory, ORNL-TM-1919, http://web.ornl.gov/info/reports/1967/3445605133753.pdf.

Wilcox, A. R. (1973). Indices of qualitative variation and political measurement. The Western Political Quarterly, 26 (2), 325–343. https://doi.org/10.2307/446831

Published

2017-10-30

Issue

Section

Articles

How to Cite

Burch, B., Egbert, J., & Biber, D. (2017). Measuring and interpreting lexical dispersion in corpus linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science, 3(2), 189-216. https://doi.org/10.1558/jrds.33066