Using Statistical Techniques and Web Search to Correct ESL Errors

Authors

  • Michael Gamon
  • Claudia Leacock
  • Chris Brockett
  • William B. Dolan
  • Jianfeng Gao
  • Dmitriy Belenko
  • Alexandre Klementiev

DOI:

https://doi.org/10.1558/cj.v26i3.491-511

Keywords:

Computational Linguistics, Automatic Error Detection, Data-Driven Error Detection

Abstract

In this paper we present a system for automatic correction of errors made by learners of English. The system has two novel aspects. First, machine-learned classifiers trained on large amounts of native data and a very large language model are combined to optimize the precision of suggested corrections. Second, the user can access real-life web examples of both their original formulation and the suggested correction. We discuss technical details of the system, including the choice of classifier, feature sets, and language model. We also present results from an evaluation of the system on a set of corpora. We perform an automatic evaluation on native English data and a detailed manual analysis of performance on three corpora of nonnative writing: the Chinese Learners' of English Corpus (CLEC) and two corpora of web and email writing.

References

Bitchener, J., Young, S., & Cameron, D. (2005). The effect of different types of corrective feedback on ESL student writing. Journal of Second Language Writing, 14(3), 191-205.

Bond, F., Ogura, K., & Ikehara, S. (1994). Countability and number in Japanese to English machine translation. In D. Coleman (Ed.), Proceedings of the 15th Conference on Computational Linguistics (pp. 32-38). Kyoto: Association for Computational Linguistics.

Chodorow, M., Tetreault, J. R., & Han, N.-R. (2007). Detection of grammatical errors involving prepositions. In F. Costello, J. Kelleher, & M. Volk (Eds.), Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions (pp. 25-30). Prague: Association for Computational Linguistics.

Dalgish, G. M. (1985). Computer-assisted ESL research and courseware development. Computers and Composition, 2(4), 45-62.

De Felice, R., & Pulman, S. G. (2007). Automatically acquiring models of preposition use. In F. Costello, J. Kelleher, & M. Volk (Eds.), Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions (pp. 45-50). Prague: Association for Computational Linguistics.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74.

Eeg-Olofsson, J., & Knutsson, O. (2003). Automatic grammar checking for second language learners—The use of prepositions. In Proceedings of NoDaLiDa 2003. Reykjavik, Iceland: Northern European Association for Language Technology.

Gao, J., Goodman, J., & Miao, J. (2001). The use of clustering techniques for language modeling—Application to Asian languages. Computational Linguistics and Chinese Language Processing, 6(1), 27-60.

Gamon, M., Gao, J., Brockett, C., Klementiev, A., Dolan, W. B., Belenko, D., & et al. (2008). Using contextual speller techniques and language modeling for ESL error correction. In Proceedings of the Third International Joint Conference on Natural Language Processing (pp. 449-455). Hyderabad, India: Asian Federation of Natural Language Processing.

Golding, A. R., & Roth, D. (1999). A winnow-based approach to context-sensitive spelling correction. Machine Learning, 34(1), 107-130.

Gui, S., & Yang, H. (2001). Computer analysis of Chinese learner English. Paper presented at Hong Kong University of Science and Technology. Retrieved December 15, 2008, from http://lc.ust.hk/~centre/conf2001/keynote/subsect4/yang.pdf

Gui, S., & Yang, H. (Eds.). (2003). Zhongguo Xuexizhe Yingyu Yuliaohu [Chinese learner English corpus]. Shanghai: Shanghai Waiyu Jiaoyu Chubanshe.

Han, N.-R., Chodorow, M., & Leacock, C. (2004). Detecting errors in English article usage with a maximum entropy classifier trained on a large, diverse corpus. In Proceedings o f the Fourth International Conference on Language Resources and Evaluation (pp. 1625-1628). Lisbon: European Language Resources Association.

Han, N.-R., Chodorow, M., & Leacock, C. (2006). Detecting errors in English article usage by non-native speakers. Natural Language Engineering, 12(2), 115-129.

Heidorn, G. E. (2000). Intelligent writing assistance. In R. Dale, H. Moisl, & H. Somers (Eds.), A handbook of natural language processing: Techniques and applications for the processing of language as text (pp. 181-207). New York: Marcel Dekker.

Heine, J. E. (1998). Definiteness predictions for Japanese noun phrases. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (pp. 519-525). Montreal: Association for Computational Linguistics.

Hermet, M., Désilets, A., & Szpakowicz, S. (2008). Using the web as a linguistic resource to automatically correct lexico-syntactic errors. In Proceedings of the Sixth International Language Resources and Evaluation (pp. 390-396). Marrakech, Morocco: European Language Resources Association.

Izumi, E., Uchimoto, K., Saiga, T., Supnithi, T., & Isahara, H. (2003). Automatic error detection in the Japanese learners’ English spoken data. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 145-148). Sapporo: Association for Computational Linguistics.

Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus: Exploiting the language learners’ speech database for research and education. International Journal of the Computer, the Internet and Management, 12(2), 119-125.

Izumi, E., Uchimoto, K., & Isahara, H. (2005). Error annotation for corpus of Japanese learner English. In Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (pp. 71-80). Jeju Island, Korea: Association for Computational Linguistics.

Knight, K., & Chander, I. (1994). Automatic postediting of documents. In K. S. H. Forbus (Ed.), Proceedings of the 12th National Conference on Artificial Intelligence (pp. 779-784). Seattle: Morgan Kaufmann.

Lee, J. (2004). Automatic article restoration. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 31-36). Boston: Association for Computational Linguistics.

Linguistic Data Consortium (LDC). (2003). English gigaword. Available at http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T05

Minnen, G., Bond, F., & Copestake, A. (2000). Memory-based learning for article generation. In C. Cardie, W. Daelemans, C. Nédellec, & E. T. K. Sang (Eds.),Proceedings of the Fourth Conference on Computational Natural Language Learning and of the Second Learning Language in Logic Workshop (pp. 43-48). Lisbon: Association for Computational Linguistics.

Murata, M., & Nagao, M. (1993). Determination of referential property and number of nouns in Japanese sentences for machine translation into English. In Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation (pp. 218-225). Kyoto: Kyoto International Community House.

Nagata, R., Wakana, T., Masui, F., Kawai, A., & Isu, N. (2005). Detecting article errors based on the mass count distinction. In R. Dale, W. Kam-Fie, J. Su, & O.Y. Kwong (Eds.), Natural Language Processing-IJCNLP 2005, Second International Joint Conference Proceedings (pp. 815-826).

New York: Springer.

Nagata, R., Kawai, A., Morihiro, K., & Isu, N. (2006). A feedback-augmented method for detecting errors in the writing of learners of English. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 241-248). Sydney: Association for Computational Linguistics.

Nguyen, P., Gao, J., & Mahajan, M. (2007). MSRLM: A scalable language modeling toolkit (MSRTR-2007-144). Redmond, WA: Microsoft.

Resnik, P., & Smith, N. (2003). The web as a parallel corpus. Computational Linguistics, 29(3), 349-380.

Tetreault, J. R., & Chodorow, M. (2008a). The ups and downs of prepositions. In Proceedings of the 22nd International Conference on Computational Linguistics (pp. 865-872). Manchester, UK: Association for Computational Linguistics.

Tetreault, J. R., & Chodorow, M. (2008b). Native judgments of non-native usage: Experiments in preposition error detection. In Proceedings of the Workshop on Human Judgments in Computational Linguistics, 22nd International Conference on Computational Linguistics (pp 43-48). Manchester, UK: Association for Computational Linguistics.

Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (pp. 252-259). Edmonton, Canada: Association for Computational Linguistics.

Turner, J., & Charniak, E. (2007). Language modeling for determiner selection. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (pp. 177-180). Rochester, NY: Association for Computational Linguistics.

Yi, X., Gao, J., & Dolan, W. B. (2008). A web-based English proofing system for English as a second language users. In Proceedings of the Third International Joint Conference on Natural Language Processing (pp. 619-624). Hyderabad, India: Asian Federation of Natural Language Processing.

Downloads

Published

2013-01-14

Issue

Section

Articles

How to Cite

Gamon, M., Leacock, C., Brockett, C., Dolan, W. B., Gao, J., Belenko, D., & Klementiev, A. (2013). Using Statistical Techniques and Web Search to Correct ESL Errors. CALICO Journal, 26(3), 491-511. https://doi.org/10.1558/cj.v26i3.491-511

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>