Investigating the Promise of Learner Corpora

Methodological Issues

Authors

  • Nick Pendar
  • Carol A. Chapelle

DOI:

https://doi.org/10.1558/cj.v25i2.189-206

Keywords:

Learner Corpora, Automatic Classification, Lexical Analysis of Learner Language

Abstract

Researchers working with learner corpora promise quantitative results that would be of greater practical value in areas such as CALL than those from small-scale and qualitative studies. However, learner corpus research has not yet had an impact on practices in teaching and assessment. Significant methodological issues need to be examined if results from learner corpus research are going to provide convincing results about language development. This study explored the use of the International Corpus of Learner English (ICLE) focusing on methodological issues such as identification of variation in learners' levels and statistical analysis of large numbers of predictors consisting of lexical and quantitative text features. Results show promise for the lexical and quantitative variables and machine learning statistical procedures investigated in the study. They also suggest the need for a larger corpus with more systematically sampled subcorpora from across language groups and a clear classification of the texts in terms of levels of L2 development based on objective criteria.

References

Chapelle, C. A., & Douglas, D. (2006). Assessing language through computer technology. Cambridge: Cambridge University Press.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34 (2), 213-238.

Granger, S. (2002). A bird’s-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3-33). Amsterdam: John Benjamins Publishing.

Hasselgren, A. (2002). Learner corpora and language testing: Small words as markers of learner fluency. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 143-173). Amsterdam: John Benjamins Publishing.

Hinkel, E. (2003). Simplicity without elegance: Features of sentences in L1 and L2 academic texts. TESOL Quarterly, 37 (2), 275-301.

Housen, A. (2002). A corpus-based study of the L2-acquisition of the English verb system. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 77-116). Amsterdam: John Benjamins Publishing.

Mitchell, T. (1997). Machine learning (Computer Science Series). New York: McGraw Hill.

Powers, D. E., Burstein, E., Chodorow, M., Fowles, M. E., & Kukich, K. (2001). Stumping E-Rater: Challenging the validity of automated essay scoring (ETS RR 01-03). Princeton, NJ: Educational Testing Service.

Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman.

Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press.

Shermis, M. D., & Burstein, J. (Eds.). (2003). Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum Associates.

Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity. Honolulu, HI: University of Hawaii Press.

West, M. (1953). A general service list of English words. London: Longman, Green & Co.

Xue G., & Nation, I. S. P. (1984). A university word list. Language Learning and Communication, 3 (2), 215-229.

Downloads

Published

2013-01-14

Issue

Section

Articles

How to Cite

Pendar, N., & Chapelle, C. A. (2013). Investigating the Promise of Learner Corpora: Methodological Issues. CALICO Journal, 25(2), 189-206. https://doi.org/10.1558/cj.v25i2.189-206