Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn

Sha Liu; Antony John Kunnan

doi:10.1558/cj.v33i1.26380

Authors

Sha Liu School of Foreign Languages, China West Normal University, China.
Antony John Kunnan Nanyang Technological University, Singapore.

DOI:

https://doi.org/10.1558/cj.v33i1.26380

Keywords:

Accuracy of automated feedback, automated writing evaluation, Chinese undergraduate English majors, scoring ability, WriteToLearn

Abstract

This study investigated the application of WriteToLearn on Chinese undergraduate English majors’ essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was marked by four human raters as well as WriteToLearn. Many-facet Rasch measurement (MFRM) was conducted to calibrate WriteToLearn’s rating performance in scoring the whole set of essays against those of four trained human raters. The accuracy of WriteToLearn’s feedback on 60 randomly selected essays was compared with the feedback provided by human raters. The two main findings related to scoring were that WriteToLearn was more consistent but highly stringent relative to the four trained human raters in scoring essays and that it failed to score 7 essays. In terms of error feedback, WriteToLearn had an overall precision and recall of 49% and 18.7% respectively. These figures did not meet the minimum threshold of 90% precision for it to be a reliable error detecting tool set by Burstein, Chodorow, and Leacock (2003). Furthermore, it had difficulty in identifying the errors made by Chinese undergraduate English majors in the use of articles, prepositions, word choice and expression.

Author Biographies

Sha Liu, School of Foreign Languages, China West Normal University, China.

Sha Liu is assistant lecturer at School of Foreign Languages at China West Normal University in People’s Republic of China. She teaches English Essay Writing and Integrated English Course to English majors. Her research focuses on second language writing assessment and the application of automated writing evaluation to classroom settings.
Antony John Kunnan, Nanyang Technological University, Singapore.

Antony John Kunnan is Professor of English Language at Nanyang Technological University, Singapore. He has published widely in the area of language assessment, especially, on validation, test bias, and language assessment policy. His recent publications include a 4-volume edited collection of original chapters titled The Companion to Language Assessment (Wiley, 2014) and a 4-volume edited collection of published papers titled Language Testing and Assessment (Routledge, 2015). He was the founding editor of Language Assessment Quarterly (2003-2013), past president of the International Language Testing Association and current president of the Asian Association for Language Assessment.

References

Aryadoust, V., & Liu, S. (2015). Predicting EFL writing ability from levels of mental representation measured by Coh-Metrix: A structural equation modeling study. Assessing Writing, 24, 35–58. http://dx.doi.org/10.1016/j.asw.2015.03.001

Attali, Y. (2013). Validity and reliability of automated essay scoring. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 181–199. New York, NY: Routledge.

Attali, Y., & Burstein, J. (2005). Automated essay scoring with e-rater® V.2.0 (ETS research report number RR-04-45). Retrieved from http://www.ets.org/Media/Research/pdf/RR-04-45.pdf

Bridgeman, B. (2013). Human ratings and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 221–232. New York: Routledge.

Burstein, J. (2003). The e-rater® scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds), Automated essay scoring: A cross-disciplinary perspective, 113–121. Mahwah, NJ: Lawrence Erlbaum Associates.

Burstein, J., Chodorow, M., & Leacock, C. (2003). Criterion online essay evaluation: An application for automated evaluation of student essays. Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco: Mexico.

Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., & Harris, M. D. (1998, August). Automated scoring using a hybrid feature identification technique. Proceedings of the Annual Meeting of the Association of Computational Linguistics, Montreal. Retrieved from http://www.ets.org/Media/Research/pdf/erater_acl98.pdf

Chen, C. F., & Cheng, W. Y. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology, 12 (2), 94–112.

Dikli, S., & Bleyle, S. (2014). Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17. http://dx.doi.org/10.1016/j.asw.2014.03.006

Ferris, D. R., Liu, H., Sinha, A., & Senna, M. (2013). Written corrective feedback for individual L2 Writers. Journal of Second Language Writing, 22, 307–329. http://dx.doi.org/10.1016/j.jslw.2012.09.009

Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Educational Journal of Computer-Enhanced Learning, 1 (2). Retrieved from http://imej.wfu.edu/articles/1999/2/04/printver.asp

Foltz, P. W., Lochbaum, K. E., & Rosenstein, M. R. (2011, April). Analysis of student ELA writing performance for a large scale implementation of formative assessment. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, Louisiana.

Foltz, P. W., Streeter, L. A., Lochbaum, K. E., & Landauer, T. (2013). Implementation and Application of the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 66–88. New York, NY: Routledge.

Galleta, D. F., Durcikova, A., Everard, A., & Jones, B. (2005). Does spell-checking software need a warning label? Communication of the ACM, 48 (7), 82–85. http://dx.doi.org/10.1145/1070838.1070841

Han, N., Chodorow, M., & Leacock, C. (2006). Detecting errors in English articles usage by non-native speakers. Natural Language Engineering, 12 (2): 115–129. http://dx.doi.org/10.1017/S1351324906004190

Hoang, G. (2011). Validating My Access as an automated writing instructional tool for English language learners (Unpublished Master's thesis). California State University, Los Angeles.

Koskey, K., & Shermis, M. D. (2013). Scaling and norming for automated essay scoring. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 200–220. New York: Routledge.

Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment. Assessment in Education, 10, 295–308. http://dx.doi.org/10.1080/0969594032000148154

Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2010). Automated grammatical error detection for language learners. Synthesis Lectures on Human Language Technologies, 3, 1–34. http://dx.doi.org/10.2200/S00275ED1V01Y201006HLT009

Li, J., Link, S., & Hegelheimer, V. (2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–18. http://dx.doi.org/10.1016/j.jslw.2014.10.004

Li, Z., Link, S., Ma, H., Yang, H., & Hegelheimer, V. (2014). The role of automated writing evaluation holistic scores in the ESL classroom. System, 44, 66–78. http://dx.doi.org/10.1016/j.system.2014.02.007

Linacre, J. M. (2013a). A user guide to Facets, Rasch-model computer programs. Chicago, IL: Winsteps.com.

Linacre, J. M. (2013b). Facets Rasch measurement [computer program]. Chicago, IL: Winsteps.com.

McGee, T. (2006). Taking a spin on the Intelligent Essay Assessor. In P. F. Ericsson & R. H. Haswell (Eds), Machine scoring of student essays: Truth and consequences, 79–92. Logan, UT: Utah State University Press.

McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–39. http://dx.doi.org/10.1016/j.asw.2014.09.002

Pearson Education Inc. (2010). Intelligent Essay Assessor (IEA) fact sheet. Retrieved from http://kt.pearsonassessments.com/download/IEA-FactSheet-20100401.pdf

Perelman, L. (2014). When ‘the state of the art’ is counting words. Assessing Writing, 21, 104–111. http://dx.doi.org/10.1016/j.asw.2014.05.001

Powers, D. E. (2000). Computing reader agreement for the GRE Writing Assessment (ETS research memorandum, RM-00-08). Princeton, NJ: Educational Testing Service.

Powers, D. E., Burstein, J., C., Chodorow, M., Fowels, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18 (2), 103–134. http://dx.doi.org/10.1016/S0747-5632(01)00052-8

Shermis, M. D. (2014). State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration, Assessing Writing, 20, 53–76. http://dx.doi.org/10.1016/j.asw.2013.04.001

Shermis, M. D., & Burstein, J. C. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds), Automated essay scoring: A cross-disciplinary perspective, xiii–xvi. Mahwah, NJ: Lawrence Erlbaum Associates.

Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65. http://dx.doi.org/10.1016/j.asw.2013.11.007

Tetreault, J., & Chodorow, M. (2008a, August). The ups and downs of preposition errors detection in ESL writing. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. http://dx.doi.org/10.3115/1599081.1599190

Tetreault, J., & Chodorow, M. (2008b, August). Native judgments of non-native usage: Experiments in preposition error detection. Proceedings of the Workshop on Human Judgments in Computational Linguistics at the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. http://dx.doi.org/10.3115/1611628.1611633

Vantage Learning. (2003a). Assessing the accuracy of Intellimetric for scoring a district-wide writing assessment (RB-806). Newton, PA: Vantage Learning.

Vantage Learning. (2003b). How does Intellimetric score essay response? (RB-929). Newton, PA: Vantage Learning.

Vantage Learning. (2006). Research summary: Intellimetric scoring accuracy across genres and grade levels. Retrieved from http://www.vantagelearning.com/docs/intellimetric/IM_ReseachSummary_InteliMetric_Accuracy_Across_Genre_and_Grade_Levels.pdf

Warschauer, M., & Grimes, D. (2008). Automated writing in the classroom. Pedagogies: An International Journal, 3 (1), 22–26. http://dx.doi.org/10.1080/15544800701771580

Warschauer, M., & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language teaching research, 10 (2), 157–180. http://dx.doi.org/10.1191/1362168806lr190oa

Weigle, S. C. (2013a). English language learners and automated scoring of essays: Critical considerations. Assessing Writing, 18, 85–99. http://dx.doi.org/10.1016/j.asw.2012.10.006

Weigle, S. C. (2013b). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluations: Current applications and new directions, 36–54. New York: Routledge.

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Subscription

Information

Accessibility

Unsubscribe

Latest publications