CALICO Journal, Vol 36, No 2 (2019)

Learners’ Feedback Regarding ASR-based Dictation Practice for Pronunciation Learning

Shannon McCrocklin
Issued Date: 17 Apr 2019


Although early ASR-based dictation programs were criticized for lack of accuracy and explicit feedback for L2 pronunciation practice, teachers and researchers have shown renewed interest. However, little is known about student reactions to ASRbased dictation practice. This qualitative study examines student perspectives, identifying advantages and challenges to working with dictation software and generating ideas for the ideal ASR dictation program. Advanced ESL participants (n=16) worked with Windows Speech Recognition in a three-week hybrid pronunciation workshop. The study identifies many themes, including advantages such as ease of use, usefulness for pronunciation learning due to feedback provided, and heightened awareness of pronunciation issues, but also disadvantages, such as frustrating levels of recognition, particularly in the first attempt, doubts of the program's transcription abilities, and lack of convenience. Participants reported that convenience and greater support in pronunciation practice would be important for an ideal program.

Download Media

PDF Subscribers Only

DOI: 10.1558/cj.34738


Blankenship, B. (1991). Second language vowel perception. Journal of the Acoustical Society of America, 90, 2252–2252.

Celce-Murcia, M., Brinton, D., & Goodwin, J. (2010). Teaching pronunciation (2nd ed.). Cambridge, England: Cambridge University Press.

Cincarek, T., Gruhn, R., Hacker, C., Nöth, E., & Nakamura, S. (2008). Automatic pronunciation scoring of words and sentences independent from the non-native’s first language. Computer Speech and Language, 23, 65–88.

Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27, 49–64.

Cordier, D. (2009). Speech recognition software for language learning: Toward an evaluation of validity and student perceptions (Doctoral dissertation).

Creswell, J. W., & Plano Clark, V. L. (2007). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage Publications.

Cucchiarini, C., & Strik, H. (2018). Automatic Speech Recognition. In O. Kang, R. I. Thomson, & J. M. Murphy (Eds.), The Routledge handbook of contemporary English pronunciation (pp. 556–569). New York, NY: Routledge.

Derwing, T., Munro, M., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603.

Flege, J. E., Munro, M. J., & Fox, R. A. (1993). Auditory and categorical effects on cross-language vowel perception. Journal of the Acoustical Society of America, 95, 3623–3641.

Gao, Y., Xie, Y., Cao, W., & Zhang, J. (2015). A study on robust detection of pronunciation erroneous tendency based on deep neural network. Proceedings from INTERSPEECH 2015, Dresden, Germany, 693–696. Available from

Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation. ReCALL. 15(1), 3–20.

Hincks, R. (2015). Technology and leaning pronunciation. In M. Reed & J. Levis (Eds), The handbook of English pronunciation (pp. 505–519). Malden, MA: John Wiley & Sons.

Johnson, B., & Turner, L. A. (2003). Data collection strategies in mixed methods research. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 297–319). Thousand Oaks, CA: Sage Publications.

Levis, J., & Suvorov, R. (2014). Automated speech recognition. In C. Chapelle (Ed.), The encyclopedia of applied linguistics.

Levy, M. (2015). The role of qualitative approaches to research in CALL contexts: Closing in on the learner’s experience. CALICO Journal, 32(2), 554–568.

Liakin, D., Cardoso, W., & Liakina, N. (2014). Learning L2 pronunciation with a mobile speech recognizer: French /y/. CALICO Journal, 32(1), 1–25.

Liakin, D., Cardoso, W., & Liakina, N. (2017). Mobilizing instruction in a second-language context: Perceptions of two speech technologies. Languages, 2(3), 1–21.

Madriz, E. (2000). Focus groups in feminist research. In N. Denzin & Y. Lincoln (Eds.), Handbook of qualitative research (2nd ed.) (pp. 835–850). Thousand Oaks, CA: Sage Publications.

McCrocklin, S. (2016). Pronunciation learner autonomy: The potential of automatic speech recognition. System, 57, 25–42.

McCrocklin, S. (2019). ASR-based dictation practice for second language pronunciation improvement. Journal of Second Language Pronunciation, 5(1), 98–118.

Mroz, A. (2018). Seeing how people hear you: French learners experiencing intelligibility through automatic speech recognition. Foreign Language Annals, 51(3), 1–21.

Neri, A., Cucchiarini, C., & Strik H. (2003). Automatic speech recognition for second language learning: How and why it actually works. Proceedings from the 15th ICPhS, Barcelona, Spain, 1157–1160.

Neri, A., Mich, O., Gerosa, M., & Giuliani, D. (2008). The effectiveness of computer assisted pronunciation training for foreign language learning by children. Computer Assisted Language Learning, 21(5), 393–408.

Saito, K., & Lyster, R. (2012). Effects of form-focused instruction and corrective feedback on L2 pronunciation development of /ɹ/ by Japanese learners of English. Language Learning, 62(2), 595–633.

Schwienhorst, K. (2008). Learner autonomy and CALL environments. New York, NY: Routledge.

Sheerin, S. (1997). An exploration of the relationship between self-access and independent learning. In P. Benson & P. Voller (Eds.), Autonomy and independence in language learning (pp. 54–65). London, England: Longman.

Strik, H., Neri, A., & Cucchiarini, C. (2008). Speech technology for language tutoring. Proceedings of Language and Speech Technology (LangTech ‘08) Conference, Rome, Italy, 73–76.

Tepperman, J. (2009). Hierarchical methods in automatic pronunciation evaluation. (Doctoral dissertation). Ann Arbor, MI: UMI Dissertation Services.

Wallace, L. (2016). Using Google Web Speech as a springboard for identifying personal pronunciation problems. Proceedings of the 7th Annual Pronunciation in Second Language Learning and Teaching Conference. Retrieved from

Wang, H., Qian, W., & Meng, H. (2013). Predicting gradation of L2 English mispronunciations using crowdsourced ratings and phonological rules. Proceedings from Speech and Language Technology in Education 2013. Grenoble, France, 1–5.

Wang, Y. H., & Young, S. S. C. (2015). Effectiveness of feedback for enhancing English pronunciation in an ASR-based CALL system. Journal of Computer Assisted Learning, 31, 493–504.


  • There are currently no refbacks.

Equinox Publishing Ltd - 415 The Workstation 15 Paternoster Row, Sheffield, S1 2BX United Kingdom
Telephone: +44 (0)114 221-0285 - Email:

Privacy Policy