CALICO Journal, Vol 33, No 1 (2016)

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn

Sha Liu, Antony John Kunnan
Issued Date: 30 Jan 2016

Abstract


This study investigated the application of WriteToLearn on Chinese undergraduate English majors’ essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was marked by four human raters as well as WriteToLearn. Many-facet Rasch measurement (MFRM) was conducted to calibrate WriteToLearn’s rating performance in scoring the whole set of essays against those of four trained human raters. The accuracy of WriteToLearn’s feedback on 60 randomly selected essays was compared with the feedback provided by human raters. The two main findings related to scoring were that WriteToLearn was more consistent but highly stringent relative to the four trained human raters in scoring essays and that it failed to score 7 essays. In terms of error feedback, WriteToLearn had an overall precision and recall of 49% and 18.7% respectively. These figures did not meet the minimum threshold of 90% precision for it to be a reliable error detecting tool set by Burstein, Chodorow, and Leacock (2003). Furthermore, it had difficulty in identifying the errors made by Chinese undergraduate English majors in the use of articles, prepositions, word choice and expression.

Download Media

PDF Subscribers Only

DOI: 10.1558/cj.v33i1.26380

References


Aryadoust, V., & Liu, S. (2015). Predicting EFL writing ability from levels of mental representation measured by Coh-Metrix: A structural equation modeling study. Assessing Writing, 24, 35–58. http://dx.doi.org/10.1016/j.asw.2015.03.001


Attali, Y. (2013). Validity and reliability of automated essay scoring. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 181–199. New York, NY: Routledge.


Attali, Y., & Burstein, J. (2005). Automated essay scoring with e-rater® V.2.0 (ETS research report number RR-04-45). Retrieved from http://www.ets.org/Media/Research/pdf/RR-04-45.pdf


Bridgeman, B. (2013). Human ratings and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 221–232. New York: Routledge.


Burstein, J. (2003). The e-rater® scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds), Automated essay scoring: A cross-disciplinary perspective, 113–121. Mahwah, NJ: Lawrence Erlbaum Associates.


Burstein, J., Chodorow, M., & Leacock, C. (2003). Criterion online essay evaluation: An application for automated evaluation of student essays. Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco: Mexico.


Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., & Harris, M. D. (1998, August). Automated scoring using a hybrid feature identification technique. Proceedings of the Annual Meeting of the Association of Computational Linguistics, Montreal. Retrieved from http://www.ets.org/Media/Research/pdf/erater_acl98.pdf


Chen, C. F., & Cheng, W. Y. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology, 12 (2), 94–112.


Dikli, S., & Bleyle, S. (2014). Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17. http://dx.doi.org/10.1016/j.asw.2014.03.006


Ferris, D. R., Liu, H., Sinha, A., & Senna, M. (2013). Written corrective feedback for individual L2 Writers. Journal of Second Language Writing, 22, 307–329. http://dx.doi.org/10.1016/j.jslw.2012.09.009


Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Educational Journal of Computer-Enhanced Learning, 1 (2). Retrieved from http://imej.wfu.edu/articles/1999/2/04/printver.asp


Foltz, P. W., Lochbaum, K. E., & Rosenstein, M. R. (2011, April). Analysis of student ELA writing performance for a large scale implementation of formative assessment. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, Louisiana.


Foltz, P. W., Streeter, L. A., Lochbaum, K. E., & Landauer, T. (2013). Implementation and Application of the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 66–88. New York, NY: Routledge.


Galleta, D. F., Durcikova, A., Everard, A., & Jones, B. (2005). Does spell-checking software need a warning label? Communication of the ACM, 48 (7), 82–85. http://dx.doi.org/10.1145/1070838.1070841


Han, N., Chodorow, M., & Leacock, C. (2006). Detecting errors in English articles usage by non-native speakers. Natural Language Engineering, 12 (2): 115–129. http://dx.doi.org/10.1017/S1351324906004190


Hoang, G. (2011). Validating My Access as an automated writing instructional tool for English language learners (Unpublished Master's thesis). California State University, Los Angeles.


Koskey, K., & Shermis, M. D. (2013). Scaling and norming for automated essay scoring. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluation: Current applications and new directions, 200–220. New York: Routledge.


Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automatic essay assessment. Assessment in Education, 10, 295–308. http://dx.doi.org/10.1080/0969594032000148154


Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2010). Automated grammatical error detection for language learners. Synthesis Lectures on Human Language Technologies, 3, 1–34. http://dx.doi.org/10.2200/S00275ED1V01Y201006HLT009


Li, J., Link, S., & Hegelheimer, V. (2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–18. http://dx.doi.org/10.1016/j.jslw.2014.10.004


Li, Z., Link, S., Ma, H., Yang, H., & Hegelheimer, V. (2014). The role of automated writing evaluation holistic scores in the ESL classroom. System, 44, 66–78. http://dx.doi.org/10.1016/j.system.2014.02.007


Linacre, J. M. (2013a). A user guide to Facets, Rasch-model computer programs. Chicago, IL: Winsteps.com.


Linacre, J. M. (2013b). Facets Rasch measurement [computer program]. Chicago, IL: Winsteps.com.


McGee, T. (2006). Taking a spin on the Intelligent Essay Assessor. In P. F. Ericsson & R. H. Haswell (Eds), Machine scoring of student essays: Truth and consequences, 79–92. Logan, UT: Utah State University Press.


McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–39. http://dx.doi.org/10.1016/j.asw.2014.09.002


Pearson Education Inc. (2010). Intelligent Essay Assessor (IEA) fact sheet. Retrieved from http://kt.pearsonassessments.com/download/IEA-FactSheet-20100401.pdf


Perelman, L. (2014). When ‘the state of the art’ is counting words. Assessing Writing, 21, 104–111. http://dx.doi.org/10.1016/j.asw.2014.05.001


Powers, D. E. (2000). Computing reader agreement for the GRE Writing Assessment (ETS research memorandum, RM-00-08). Princeton, NJ: Educational Testing Service.


Powers, D. E., Burstein, J., C., Chodorow, M., Fowels, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18 (2), 103–134. http://dx.doi.org/10.1016/S0747-5632(01)00052-8


Shermis, M. D. (2014). State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration, Assessing Writing, 20, 53–76. http://dx.doi.org/10.1016/j.asw.2013.04.001


Shermis, M. D., & Burstein, J. C. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds), Automated essay scoring: A cross-disciplinary perspective, xiii–xvi. Mahwah, NJ: Lawrence Erlbaum Associates.


Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65. http://dx.doi.org/10.1016/j.asw.2013.11.007


Tetreault, J., & Chodorow, M. (2008a, August). The ups and downs of preposition errors detection in ESL writing. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. http://dx.doi.org/10.3115/1599081.1599190


Tetreault, J., & Chodorow, M. (2008b, August). Native judgments of non-native usage: Experiments in preposition error detection. Proceedings of the Workshop on Human Judgments in Computational Linguistics at the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. http://dx.doi.org/10.3115/1611628.1611633


Vantage Learning. (2003a). Assessing the accuracy of Intellimetric for scoring a district-wide writing assessment (RB-806). Newton, PA: Vantage Learning.


Vantage Learning. (2003b). How does Intellimetric score essay response? (RB-929). Newton, PA: Vantage Learning.


Vantage Learning. (2006). Research summary: Intellimetric scoring accuracy across genres and grade levels. Retrieved from http://www.vantagelearning.com/docs/intellimetric/IM_ReseachSummary_InteliMetric_Accuracy_Across_Genre_and_Grade_Levels.pdf


Warschauer, M., & Grimes, D. (2008). Automated writing in the classroom. Pedagogies: An International Journal, 3 (1), 22–26. http://dx.doi.org/10.1080/15544800701771580


Warschauer, M., & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language teaching research, 10 (2), 157–180. http://dx.doi.org/10.1191/1362168806lr190oa


Weigle, S. C. (2013a). English language learners and automated scoring of essays: Critical considerations. Assessing Writing, 18, 85–99. http://dx.doi.org/10.1016/j.asw.2012.10.006


Weigle, S. C. (2013b). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds), Handbook of automated essay evaluations: Current applications and new directions, 36–54. New York: Routledge.


Refbacks

  • There are currently no refbacks.





Equinox Publishing Ltd - 415 The Workstation 15 Paternoster Row, Sheffield, S1 2BX United Kingdom
Telephone: +44 (0)114 221-0285 - Email: info@equinoxpub.com

Privacy Policy