A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests

Authors

  • David Coniam

DOI:

https://doi.org/10.1558/cj.v14i2-4.15-33

Keywords:

Corpus, word frequency, language testing, word class tagging, computer test production

Abstract

This paper outlines how a multiple choice vocabulary cloze test can be produced from a text. The process described involves assigning word class tags to the text and then retrieving word frequencies for the words in the text from an analyzed corpus. The system allows for the creation of three types of test--one based on the "nth-word deletion" principle, one based on user-specified frequency ranges, and one based on a particular word class. After the user's selection, word class and word frequency of each test item key are matched with similar word class and word frequency options to construct the test items. Analysis of tests produced by the system and administered to students indicates the potential of the computer aided test system, although the three test production modes are not equally successful in their production of "acceptable" test items with the nth-word deletion mode producing considerably fewer acceptable items than the two language oriented test production modes of specified word frequency ranges and particular word classes. The paper concludes with a discussion of the extent to which good test material can be realistically produced by computer aided systems and the different computer tools which may be of use in the process.

References

Black, E. (1993). “Statistically-based Computer Analysis of English.” In Statisti-cally-driven Computer Grammars of English: the IBM/Lancaster Ap-proach, edited by E. Black, R. Garside, and G. Leech, 1-16. Amsterdam:Rodopi.

Coniam, D. (1995). “Towards a Common Ability Scale for Hong Kong EnglishSecondary School Forms.” Language Testing 12, 2, 184-195.

_____. (1995). “Partial Parsing: Software for Marking Linguistic Boundaries inEnglish Texts.” Ph.D. diss., University of Birmingham.

_____. (in preparation). “Word Frequency and Language Proficiency.”

Falvey, P., J. Holbrook, and D. Coniam. (1994). Assessing Students. Hong Kong;Longman.

Garside, R. (1987). “The CLAWS Word tagging System.” In The ComputationalAnalysis of English, edited by R. Garside, G. Leech, and G. Sampson,30-41. London: Longman.

Harlech-Jones, B. “ESL Proficiency and a Word Frequency Count.” ELT Journal37, 1 (1983): 62-70.

Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge Uni-versity Press.

Institute of Natural Language Processing Shanghai Jiaotong University. (1988).Automatic Grammatical Tagging System, V. 1.0. Shanghai: Institute ofNatural Language Processing, Shanghai Jiaotong University.

Karlsson, F., A. Voutilainen, J. Heikkila, and A. Anttila. (1995). Constraint Gram-mar: A Language-independent System for Parsing Unrestricted Text. Ber-lin: Mouton de Gruyter.

Laufer, B. and P. Nation. (1995). “Vocabulary Size and Use: Lexical Richness in L2Written Production.” Applied Linguistics 16, 3, 307-322.

Lewis, M. (1993). The Lexical Approach. Hove: Language Teaching Publications.

Madsden, H. (1983). Techniques in Testing. New York: Oxford University Press.

Project Gutenberg. (1995). Etext of Roget’s Thesaurus Number Two. Lisle, IL.:Benedictine College.

Sampson, G. (1987). “Alternative Grammatical Coding Systems.” In The Compu-tational Analysis of English, edited by R. Garside, G. Leech and G.Sampson, 165-183. London: Longman.

Sinclair, J. M. (1992). “Automatic Analysis of Corpora.” In Directions in CorpusLinguistics, edited by J. Svartvik, 379-397. Proceedings of Nobel Sym-posium 82, Stockholm, 4-8 Aug. 1991. Berlin: Mouton de Gruyter.

Sinclair, J. (1991). Corpus Concordance Collocation. Oxford: Oxford UniversityPress.

_____, ed. (1987). “The Nature of the Evidence.” Looking Up. London: Collins.

Spolsky, B. (1985). “What Does it Mean to Know How to Use a Language? AnEssay on the Theoretical Basis of Language Testing.” Language Testing2, 180-191.

Willis, D. (1990). The Lexical Syllabus. London and Glasgow: Collins ELT.

Downloads

Published

2013-01-14

Issue

Section

Articles

How to Cite

Coniam, D. (2013). A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal, 14(2-4), 15-33. https://doi.org/10.1558/cj.v14i2-4.15-33