Modifying Corpus Annotation to Support the Analysis of Learner Language

Authors

  • Markus Dickinson
  • Chong Min Lee

DOI:

https://doi.org/10.1558/cj.v26i3.545-561

Keywords:

Korean Postpositional Particles, Learner Language, Dependency Parsing, Treebank Conversion

Abstract

A crucial question for automatically analyzing learner language is to determine which grammatical information is relevant and useful for learner feedback. Based on knowledge about how learner language varies in its grammatical properties, we propose a framework for reusing analyses found in corpus annotation and illustrate its applicability to Korean postpositional particles. Simple transformations of the corpus annotation allow one to quickly use state-of-the-art parsing methods.

References

Abeillé, A. (Ed.). (2003). Treebanks: Building and using syntactically annotated corpora. Dordrecht: Kluwer Academic Publishers.

Amaral, L., & Meurers, D. (2007). Putting activity models in the driver’s seat: Towards a demand-driven NLP architecture for ICALL. Paper presented at EUROCALL, University of Ulster, Coleraine, Northern Ireland.

Bailey, S., & Meurers, D. (2008). Diagnosing meaning errors in short answers to reading comprehension questions. In J. Tetreault, J. Burstein, & R. De Felice (Eds.), Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, held at ACL 2008 (pp. 107-115). Columbus, OH: Association for Computational Linguistics. Retrieved April 10, 2009, from http://aclweb.org/anthology-new/W/W08/W08-0913.pdf

Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. In L. Marquez & D. Klein (Eds.), Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X) (pp. 149-164). New York: Association for Computational Linguistics.

Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In K. Knight, H. T. Ng, & K. Oflazer (Eds.), Proceedings of ACL-05 (pp. 173-180). Ann Arbor, MI: Association for Computational Linguistics.

Chodorow, M., Tetreault, J., & Han, N. (2007). Detection of grammatical errors involving prepositions. In F. Costello, J. Kelleher, & M. Volk (Eds.), Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (pp. 25-30). Prague, Czech Republic: Association for Computational Linguistics.

Chung, H. (2004). Statistical Korean dependency parsing model based on the surface contextual information. Unpublished doctoral dissertation, Korea University, Seoul.

Collins, M. (1999). Head-driven statistical models for natural language parsing. Unpublished doctoral dissertation, University of Pennsylvania, Philadelphia, PA.

De Felice, R., & Pulman, S. (2007). Automatically acquiring models of preposition use. In F. Costello, J. Kelleher, & M. Volk (Eds.), Proceedings of the 4th ACL-SIGSEM Workshop on Prepositions (pp.45-50). Prague, Czech Republic: Association for Computational Linguistics.

De Felice, R., & Pulman, S. (2008). A classifier-based approach to preposition and determiner error correction in L2 English. In D. Scott & H. Uszkoreit (Eds.), Proceedings of COLING-08 (pp. 169-176). Manchester, UK: Coling 2008 Organizing Committee.

de Ilarraza, A. D., Gojenola, K., & Oronoz, M. (2008). Detecting erroneous uses of complex postpositions in an agglutinative language. In D. Scott & H. Uszkoreit (Eds.), Proceedings of COLING-08 (pp.31-34). Manchester, UK: Coling 2008 Organizing Committee.

Dickinson, M. (2006). Rule equivalence for error detection. In J. Hajič & J. Nivre, Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006) (pp. 187-198). Prague, Czech Republic: Institue of Formal and Applied Linguistics.

Dickinson, M., Eom, S., Kang, Y., Lee, C. M., & Sachs, R. (2008). A balancing act: How can intelligent computer-generated feedback be provided in learner-to-learner interactions. Computer Assisted Language Learning, 21, 369-382.

Eeg-Olofsson, J., & Knutsson, O. (2003). Automatic grammar checking for second language learners—The use of prepositions. In E. Röognvaldsson (Ed.), Proceedings of Nodalida ’03. Reykjavik, Iceland: Northern European Association for Language Technology.

Gamon, M., Gao, J., Brockett, C., Klementiev, A., Dolan, W., Belenko, D., et al. (2008). Using contextual speller techniques and language modeling for ESL error correction. In Y. Matsumoto & A. Copestake (Eds.), Proceedings of the International Joint Conference on Natural Language Processing (pp. 449-456). Hyderabad, India: Asian Federation of Natural Language Processing.

Han, C.-H., Han, N.-R., & Ko, E.-S. (2001). Bracketing guidelines for Penn Korean treebank (Technical report, IRCS). Philadelphia, PA: University of Pennsylvania.

Han, C.-H., Han, N.-R., Ko, E.-S., & Palmer, M. (2002). Development and evaluation of a Korean treebank and its application to NLP. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, A. Martin Municio, D. Tapias, et al. (Eds.), Proceedings of LREC-02 (pp. 1635-1642). Las Palmas, Canary Islands, Spain: European Language Resources Association.

Han, C.-H., & Palmer, M. (2004). A morphological tagger for Korean: Statistical tagging combined with corpus-based morphological rule application. Machine Translation, 18, 275-297.

Han, N.-R., Chodorow, M., & Leacock, C. (2006). Detecting errors in English article usage by non-native speakers. Natural Language Engineering, 12, 115-129.

Hana, J., Feldman, A., & Brew, C. (2004). A resource-light approach to Russian morphology: Tagging Russian using Czech resources. In D. Lin & D. Wu (Eds.), Proceedings of EMNLP-04 (pp. 222-229). Barcelona: Association for Computational Linguistics.

Heift, T., & Schulze, M. (2007). Errors and intelligence in computer-assisted language learning: Parsers and pedagogues. New York: Routledge.

Hong, M. (2000). Centering theory and argument deletion in spoken Korean. The Korean Journal of Cognitive Science, 11, 9-24.

Izumi, E., Uchimoto, K., Saiga, T., Supnithi, T., & Isahara, H. (2003). Automatic error detection in the Japanese learners’ English spoken data. In E. W. Hinrichs & D. Roth (Eds.), Proceedings of ACL03 (pp. 145-148). Sapporo, Japan: Association for Computational Linguistics.

Kim, H. (2006). Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium (pp. 49-54). Tokyo: National Institute for Japanese Language.

Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In E. W. Hinrichs & D. Roth (Eds.), Proceedings of ACL-03 (pp. 423-430). Sapporo, Japan: Association for Computational Linguistics.

Ko, S., Kim, M., Kim, J., Seo, S., Chung, H., & Han, S. (2004). An analysis of Korean learner corpora and errors. Seoul: Hankuk Publishing.

Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

Kudo, T., & Matsumoto., Y. (2000). Japanese dependency analysis based on support vector machines. In H. Schütze & K.-Y. Su (Eds.), Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 18-25). Hong Kong: Association for Computational Linguistics.

Lee, J., & Knutsson, O. (2008). The role of PP attachment in preposition generation. In A. Gelbukh (Ed.), Proceedings of CICLing 2008, 9th International Conference on Intelligent Text Processing and Computational Linguistics (pp. 643-654). Haifa, Israel: Springer.

Lee, S.-H. (2004). Case markers and thematic roles. Seoul: Hankuk Publishing.

Lee, S.-H., Byron, D. K., & Jang, S. B. (2005). Why is zero marking important in Korean? In R. Dale, K.-F. Wong, J. Su, & O. Y. Kwong (Eds.), Proceedings of IJCNLP-05 (pp. 588-599). Jeju Island, Korea: Springer.

Lee, S.-H., Jang, S. B., & Seo, S. K. (2009). Annotation of Korean learner corpora for particle error detection. CALICO Journal, 26, 529-544.

Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19, 313-330.

McClosky, D., Charniak, E., & Johnson, M. (2006). Reranking and self-training for parser adaptation. In N. Calzolari, C. Cardie, & P. Isabelle (Eds.), Proceedings of COLING-ACL-06 (pp. 337-344). Sydney, Australia: Association for Computational Linguistics.

McDonald, R., & Pereira, F. (2006). Online learning of approximate dependency parsing algorithms. In D. McCarthy & S. Wintner (Eds.), Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (pp. 81-88). Trento, Italy: Association for Computational Linguistics. Retrieved April 13, 2009, from http://aclweb.org/anthology/E06-1011

Menzel, W., & Schröder, I. (1999). Error diagnosis for language learning systems. ReCALL, 11, 20-30.

Metcalf, V., & Boyd, A. (2006). Head-lexicalized PCFGs for verb subcategorization error diagnosis in ICALL. In Workshop on Interfaces of Intelligent Computer-Assisted Language Learning. Columbus, OH.

Nagata, N. (1995). An effective application of natural language processing in second language instruction. CALICO Journal, 13, 47-67.

Nagata, R., Kawai, A., Morihiro, K., & Isu, N. (2006). A feedback-augmented method for detecting errors in the writing of learners of English. In C. Cardie & P. Isabelle (Eds.), Proceedings of the International Conference on Computational Linguistics and Meeting of the Association for Computational Linguistics (pp. 241-248). Sydney, Australia: Association for Computational Linguistics.

Nilsson, J., & Hall, J. (2005). Reconstruction of the Swedish treebank Talbanken (MSI report 05067). Växjö, Sweden: Växjö University, School of Mathematics and Systems Engineering.

Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In H. Bunt (Ed.), Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03) (pp. 149-160). Nancy, France: Association for Computational Linguistics.

Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., et al. (2007a). The CoNLL 2007 shared task on dependency parsing. In J. Eisner (Ed.), Proceedings of EMNLP-CoNLL 2007 (pp. 915-932). Prague, Czech Republic: Association for Computational Linguistics.

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S. et al. (2007b). MaltParser: A languageindependent system for data-driven dependency parsing. Natural Language Engineering, 13, 95-135.

Pate, J., & Meurers, D. (2007). Refining syntactic categories using local contexts—Experiments in unlexicalized PCFG parsing. In S. Kübler, J. Hajič, & K. De Smedt (Eds.), Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT 2007) (pp. 103-114). Bergen, Norway: Northern European Association for Language Technology.

Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In N. Calzolari, C. Cardie, & P. Isabelle (Eds.), Proceedings of COLING-ACL-06 (pp. 433-440). Sydney, Australia: Association for Computational Linguistics.

Schneider, D., & McCoy, K. (1998). Recognizing syntactic errors in the writing of second language learners. In C. Boitet & P. Whitelock (Eds.), Proceedings of the Meeting of the Association for Computational Linguistics (pp. 1198-1204). Montreal, Canada: Association for Computational Linguistics.

Seo, K.-J, (1993). A Korean language parser using syntactic dependency relations between word-phrases. Unpublished master’s thesis, Korea Advanced Institute of Science and Technology, Daejeon, Korea.

Tetreault, J., & Chodorow, M. (2008). The ups and downs of preposition error detection in ESL writing. In D. Scott & H. Uszkoreit (Eds.), Proceedings of COLING-08 (pp. 865-872). Manchester, UK: Coling 2008 Organizing Committee.

Vandeventer Faltin, A. (2003). Syntactic error diagnosis in the context of computer assisted language learning. Unpublished doctoral dissertation, Université de Genève, Geneva, Switzerland.

Yoon, J. H. (2005). Non-morphological determination of nominal particle ordering in Korean. In L. Heggie & F. Ordonez (Eds.), Clitic and affix combinations: Theoretical perspectives (pp. 239-282). Amsterdam: John Benjamins.

Downloads

Published

2013-01-14

Issue

Section

Articles

How to Cite

Dickinson, M., & Lee, C. M. (2013). Modifying Corpus Annotation to Support the Analysis of Learner Language. CALICO Journal, 26(3), 545-561. https://doi.org/10.1558/cj.v26i3.545-561

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>