Explicit Pronunciation Training Using Automatic Speech Recognition Technology

Authors

  • Jonathan Dalby
  • Diane Kewley-Port

DOI:

https://doi.org/10.1558/cj.v16i3.425-445

Keywords:

Pronunciation Training, Speech Recognition, Speech Training Aids, Evaluation, User Tests, Minimal Pairs

Abstract

A system is described, provisionally named Pronto, which uses automatic speech recognition (ASR) for training pronunciation of second languages in adult learners. The first version of Pronto was developed for native speakers of American English learning Spanish and for Mandarin Chinese speakers learning English. Pronto grows out of work in the Indiana Speech Training Aid (ISTRA) research program, which has demonstrated significant improvement in the pronunciation of hearing-impaired and normal-hearing but misarticulating children through the use of ASR-derived feedback. This feedback has also been shown to improve pronunciation in adult learners of a second language. Methods are described for developing training in Pronto, and results are presented from evaluating classes of speech recognizers for use in different aspects of pronunciation training.

References

Anderson, J. I. (1983). The difficulties of English syllable structure for Chinese ESL learners. Language Learning and Communication, 2 (1), 53-61.

Anderson, S., & Kewley-Port, D. (1995). Evaluation of speech recognizers for speech training applications. IEEE Proceedings on speech and audio processing, 3 (4), 229-241.

Bradlow, A., Akahane-Yamada, R., Pisoni, D. B., & Tohkura, Y. (1996). Three converging tests of improvement in speech production after perceptual identification training on a non-native phonetic contrast. Journal of the Acoustical Society of America, 100 (4), Pt. 2, 2725 (A).

Brodkey, D. (1972). Dictation as a measure of mutual intelligibility: A pilot study. Language Learning, 22 (2), 203-217.

Brown, A. (1988). Functional load and the teaching of pronunciation. TESOL Quarterly, 22, 593-606.

Flege, J. E. (1984). The detection of French accent by American listeners. Journal of the Acoustical Society of America, 76, 692-707.

Flege, J. E. (1987). The production of “new” and “similar” phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47-65.

Flege, J. E., & Davidian, R. D. (1984). Transfer and developmental processes in adult foreign language speech production. Applied Psycholinguistics, 5, 323- 347.

Flege, J. E., & Wang, C. (1989). Native-language phonotactic constraints affect how well Chinese subjects perceive the word-final /t/-/d/ contrast. Journal of Phonetics, 17, 299-315.

Gass, S., & Veronis, M. (1984). The effect of familiarity on the comprehensibility of non-native speech. Language Learning, 34, 65-90.

Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “l” and “r.” Neuropsychologia, 9, 317-323.

Kenworthy, J. (1987). Teaching English Pronunciation. New York: Longman

Kewley-Port, D., Watson, C. S., Elbert, M., Maki, D. & Reed, D. (1991). The Indiana Speech Training Aid (ISTRA) II: Training curriculum and selected case studies. Clinical Linguistics and Phonetics, 5, 13-38.

LaRocca, S. (1994). Exploiting strengths and avoiding weaknesses in the use of speech recognition for language learning. CALICO Journal, 12 (1),102-105.

Lisker, L. & Abramson, A. (1964). A cross-language study of voicing in initial stops: Acoustical measurements. Word, 20, 384-422.

Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/ II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94, 1242-1255.

Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., & Yamada, T. (1994). Training Japanese listeners to identify English /r/ and /l/ III: Long-term retention of new phonetic categories. Journal of the Acoustical Society of America, 96, 2076-2087.

Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America, 89, 874-886.

Marslen-Wilson, W. D. (1985). Aspects of human speech understanding. In F. False & W. A. Woods (Eds.), Computer speech processing. Englewood Cliffs, NJ: Prentice Hall.

Morton, J. (1979). Word recognition structure and process. In J. Morton & J. Marshall (Eds.), Structure and process. Cambridge, MA: MIT Press.

Miawaki, K., Strange, W., Verbrugge, R., Liberman, A., Jenkins, J., & Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and Psychophysics, 18 (5), 331-340.

Munro, M. (1991). Perception and production of English vowels by native speakers of Arabic (Doctoral dissertation, University of Alberta, 1991).

Pisoni, D. B., Nusbaum, H., & Greene, B. (1985). Perception of synthetic speech generation by rule. Proceedings of the IEEE, 73, 1665-1676.

Port, R., & Mitleb, F. (1983). Segmental features and implementation in acquisition of English by Arabic speakers. Journal of Phonetics, 11, 219-229.

Rochet, B. L. (1995). Perception and production of second-language speech sounds by adults. In W. Strange (Ed.), Speech perception and linguistic experience. Timonium, MD: York Press.

Rogers, C. L. (1997). Segmental intelligibility assessment for Chinese-accented English (Doctoral dissertation, University of Indiana, 1997).

Rogers, C. L., & Dalby, J. M. (1996). Prediction of foreign-accented speech intelligibility from segmental contrast measures. Journal of the Acoustical Society of America, 100 (4) Pt. 2, 2725 (A).

Rogers, C. L., Dalby, J. M., & DeVane, G. (1994). Intelligibility training for foreign-accented speech: A preliminary study. Journal of the Acoustical Society of America, 96 (5) Pt. 2, 3348 (A).

Sheldon, A., & Strange, W. (1982). The acquisition of /r/ and /l/ by Japanese learners of English: Evidence that speech production can precede speech perception. Applied Psycholinguistics, 3, 243-261.

Strange, W. (1995). Cross-language studies of speech perception a historical review. In W. Strange (Ed), Speech perception and linguistic experience. Timonium, MD: York Press.

Watson, C. S., Reed, D., Kewley-Port, D., & Maki, D. (1989). The Indiana Speech Training Aid (ISTRA) I: Comparisons between human and computerbased evaluation of speech quality. Journal of Speech and Hearing Research, 32, 245-251.

Weismer, G., & Martin, R. (1992). Acoustic and perceptual approaches to the study of intelligibility. In R. D. Kent (Ed.), Intelligibility in speech disorders: Theory, measurement and management. Amsterdam: J. Benjamins.

Williams, L. (1979). The modification of speech perception and production in second-language learning. Perception and Psychophysics, 26 (2), 95-104.

Yule, G. (1990). The spoken language. Annual Review of Applied Linguistics, 10, 163-172.

Downloads

Published

2013-01-14

Issue

Section

Articles

How to Cite

Dalby, J., & Kewley-Port, D. (2013). Explicit Pronunciation Training Using Automatic Speech Recognition Technology. CALICO Journal, 16(3), 425-445. https://doi.org/10.1558/cj.v16i3.425-445

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>