Leveraging a Large Learner Corpus for Automatic Suggestion of Collocations for Learners of Japanese as a Second Language

Lis Pereira; Erlyn Manguilimotan; Yuji Matsumoto

doi:10.1558/cj.v33i3.26444

Authors

Lis Pereira Nara Institute of Science and Technology
Erlyn Manguilimotan Nara Institute of Science and Technology
Yuji Matsumoto Nara Institute of Science and Technology

DOI:

https://doi.org/10.1558/cj.v33i3.26444

Keywords:

Collocation, Japanese, language learning, automatic error collection

Abstract

One of the challenges of learning Japanese as a Second Language (JSL) is finding the appropriate word for a particular usage. To address this challenge, we developed a collocational aid designed to suggest more appropriate collocations in Japanese. In particular, we address the problem of generating and ranking noun and verb candidates for correcting potential collocation errors in the learners’ text. Given a noun-verb construction as input, our system generates possible noun or verb correction candidates based on noun and verb corrections extracted from a large Japanese learner corpus. We use this corpus to investigate the learner's tendency to commit collocation errors, and to produce a smaller and more realistic set of candidates. After combining nouns or verbs with the generated candidates to form noun-verb pairs, the system uses the Weighted Dice coefficient as the association measure to filter out inappropriate noun-verb pairs and rank the proper collocations. We report the detailed evaluation and results on learner data. In addition, we show that our system statistically outperforms existing approaches to collocation error correction. Finally, we report a preliminary user study with JSL learners.

Author Biographies

Lis Pereira, Nara Institute of Science and Technology

Lis Pereira completed her PhD at Nara Institute of Science and Technology in 2016 working on how to address content word choice errors in L2 Japanese.
Erlyn Manguilimotan, Nara Institute of Science and Technology

Erlyn Manguilimotan is a Ph.D. candidate in the Graduate School of Information Science at Nara Institute of Science and Technology. She is working on part-of-speech and syntactic analysis of the Tagalog language.
Yuji Matsumoto, Nara Institute of Science and Technology

Yuji Matsumoto is currently a Professor of Information Science at the Nara Institute of Science and Technology. He received his MSc and PhD degrees in information science from Kyoto University in 1979 and 1989 respectively. He joined the Machine Inference Section of the Electrotechnical Laboratory in 1979. He has been an academic visitor at the Imperial College of Science and Technology, a deputy chief of the First Laboratory at ICOT, and an associate professor at Kyoto University. His main research interests are natural language understanding and machine learning.

References

Chang, Y. C., Chang, J. S., Chen, H. J., & Liou, H. C. (2008). An automatic collocation writing assistant for Taiwanese EFL learners: A case of corpus-based NLP technology. Computer Assisted Language Learning, 21(3), 283–299. Retrieved from: http://dx.doi.org/10.1080/09588220802090337

Chen, M.-H., Huang, C.-C., Huang, S.-T., Chang, J.S., & Liou, H.C. (2014). An automatic reference aid for improving EFL learners’ formulaic expressions in productive language use. IEEE Transactions on Learning Technologies, 7(1), 57–68. Retrieved from: http://dx.doi.org/10.1109/TLT.2013.34

Cho, Y. S. (2013). Software review: Lang-8. CALICO Journal, 30(2), 293–299. Retrieved from: http://dx.doi.org/10.11139/cj.30.2.293-299

Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. In Proceedings of the 27th Annual Meeting on Association for Computational Linguistics (pp. 76–83). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from: http://dx.doi.org/10.3115/981623.981633

Dahlmeier, D., & Ng, H. T. (2011). Correcting semantic collocation errors with L1-induced paraphrases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 107–117). Stroudsburg, PA: Association for Computational Linguistics.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

Futagi, Y., Deane, P., Chodorow, M., & Tetreault, J. (2008). A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning, 21(4), 353–367. Retrieved from http://dx.doi.org/10.1080/09588220802343561

Harris. Z. (1954). Distributional structure. Word, 10(2–3), 146–162. Retrieved from http://dx.doi.org/10.1080/00437956.1954.11659520

Hill, J. (2000). Revising priorities: From grammatical failure to collocational success. In Michael Lewis (Ed.), Teaching Collocation: Further Developments in the Lexical Approach (pp. 88–117). Hove: Language Teaching Publications.

Jurafsky, D., & Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.). Upper Saddle River, NJ: Prentice Hall PTR.

Kitamura, M., & Matsumoto, Y. (1997). Automatic extraction of translation patterns in parallel corpora. Information Processing Society of Japan Journal, 38(4), 727–735.

Kudo, T., & Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of the 6th Conference on Natural Language Learning (pp. 1–7). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://dx.doi.org/10.3115/1118853.1118869

Lea, D., & Runcie, M. (Eds.) (2002). Oxford Collocations Dictionary for Students of English. Oxford: Oxford University Press.

Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2010). Automated Grammatical Error Detection For Language Learners (Synthesis lectures on human language technologies 3(1), pp. 1–134). San Rafael, CA: Morgan & Claypool.

Lee, L. (1999). Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 25–32). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from: http://dx.doi.org/10.3115/1034678.1034693

Lewis, M. (2000). There is nothing as practical as a good theory. In Michael Lewis (Ed.), Teaching Collocation: Further Developments in the Lexical Approach (pp. 10–27). Hove: Language Teaching Publications.

Liou, H., Chang, J., Chen, H., Lin, C., Liaw, M., Gao, Z., ... You, G. (2006). Corpora processing and computational scaffolding for a Web-based English learning environment: The CANDLE project. CALICO Journal, 24(1), 77–95.

Liu, A. L.-E.,Wible, D., & Tsao, N.-L. (2009). Automated suggestions for miscollocations. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 47–50). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from: http://dx.doi.org/10.3115/1609843.1609850

Liu, L. E. (2002). A corpus-based lexical semantic investigation of verb-noun miscollocations in Taiwan learners’ English (Master’s thesis). Tamkang University, Taipei.

Maekawa, K., Yamazaki, M., Ogiso, T., Maruyama, T.,Ogura, H., Kashino, W., … Den, Y. (2014). Balanced corpus of contemporary written Japanese. Language Resources and Evaluation, 48(2), 345–371. Retrieved from: http://dx.doi.org/10.1007/s10579-013-9261-0

Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223–242. Retrieved from http://dx.doi.org/10.1093/applin/24.2.223

Oyama, H., Komachi, M., & Matsumoto, Y. (2013). Towards automatic error type classification of Japanese language learners’ writings. In Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (pp.163–172). Taipei, Taiwan.

Park, T., Lank, E., Poupart, P., & Terry, M. (2008). “Is the sky pure today?” AwkChecker: An assistive tool for detecting and correcting collocation errors. In Proceedings of the 21th Annual Association for Computing Machinery Symposium on User Interface Software and Technology (pp. 121–130). Monterey, CA, USA.

Pereira, L. (2013). Collocation suggestion for Japanese second language learners (Master’s thesis). Nara Institute of Science and Technology, Ikoma, Japan.

Seretan, V. (2011). Syntax-Based Collocation Extraction (Text, speech and language technology series, 44). New York: Springer-Verlag. Retrieved from http://dx.doi.org/10.1007/978-94-007-0134-2_4

Shei, C.-C., & Pain, H. (2000). An ESL writer’s collocational aid. Computer Assisted Language Learning, 13(2), 167–182. Retrieved from http://dx.doi.org/10.1076/0958-8221(200004)13:2;1-D;FT167

Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.

Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1),1–38.

Voorhees, E. M.(1999). The TREC-8 question answering track evaluation. In E. M. Voochees & D. K. Harman (Eds.), Proceedings of the Text Retrieval Conference (TREC-8) (pp. 83–105). NIST Special Publication 500-246.

Wible, D., Kuo, C., Tsao, N., Liu, A., & Lin, H. (2003). Bootstrapping in a language learning environment. Journal of Computer-Assisted Learning, 19(1), 90–102. Retrieved from http://dx.doi.org/10.1046/j.0266-4909.2002.00009.x

Yi, X., Gao, J., & Dolan, W. (2008). A web-based English proofing system for English as a Second Language users. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (pp. 619–624). Stroudsburg, PA: Association for Computational Linguistics.

Leveraging a Large Learner Corpus for Automatic Suggestion of Collocations for Learners of Japanese as a Second Language

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Subscription

Information

Accessibility

Unsubscribe

Latest publications