Reporting practices of rater reliability in interpreting research

A mixed-methods review of 14 journals (2004–2014)

Authors

  • Chao Han Southwest University

DOI:

https://doi.org/10.1558/jrds.29622

Keywords:

consensus, consistency, interpreting studies, mixed-methods, rater reliability

Abstract

The issue addressed in this study is the reporting practices of rater reliability in interpreting research (IR), given that the use of raters as a method of measurement is a commonplace in IR, and that little is known about to what extent and how rater reliability estimates (RREs) have been reported. Drawing upon 447 articles from 14 translation and interpreting journals (2004--2014), this mixed-methods study attempts to gain quantitative and qualitative insights into the reporting practices. Data analysis reveals that: 1) almost 90% of the articles that needed to report RREs failed to do so; 2) potential problems emerged from those articles that reported RREs: lack of distinction between rater consensus and consistency, underreporting, misinterpretation and misuse of RREs, and lack of justification for the use of rater-generated measurements for subsequent data analysis. These findings highlight an urgent need for increased author awareness of reporting appropriate RREs in IR.

Author Biography

  • Chao Han, Southwest University

    Chao Han is currently an assistant professor in the College of International Studies at Southwest University, Chongqing, China. His PhD thesis, completed at Macquarie University (Sydney) in 2015, is entitled ‘Building the validity foundation for interpreter certification performance testing’. His research interests include interpreter performance testing and assessment (design, development and validation), measurement issues involved in interpreting studies, and mixed-methods research design.

References

Agrifoglio, M. (2004). Sight translation and interpreting: A comparative analysis of constraints and failures. Interpreting 6 (1), 43?67. https://doi.org/10.1075/intp.6.1.05agr

Angelelli, C. (2009). Using a rubric to assess translation ability: Defining the construct. In C. Angelelli and H. E. Jacobson (Eds) Testing and Assessment in Translation and Interpreting Studies, 13–47. Amsterdam: John Benjamins. https://doi.org/10.1075/ata.xiv.03ang

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Bakti, M. and Bóna, J. (2014). Source language-related erroneous stress placement in the target language output of simultaneous interpreters. Interpreting 16 (1), 34–48. https://doi.org/10.1075/intp.16.1.03bak

Bale, R. (2013). Undergraduate consecutive interpreting and lexical knowledge: The role of spoken corpora. The Interpreter and Translator Trainer 7 (1), 27–50. https://doi.org/10.1080/13556509.2013.10798842

Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports 19, 3–11. https://doi.org/10.2466/pr0.1966.19.1.3

Bart?omiejczyk, M. (2006). Strategies of simultaneous interpreting and directionality. Interpreting 8 (2), 149–174. https://doi.org/10.1075/intp.8.2.03bar

Baumgartner, T. A. (1989). Norm-referenced measurement: Reliability. In M. J. Safrit and T. M. Wood (Eds) Measurement Concepts in Physical Education and Exercise Science, 45–72. Champaign, IL: Human Kinetics.

Braun, S. (2013). Keeping your distance? Remote interpreting in legal proceedings: A critical assessment of a growing practice. Interpreting 15 (2), 200–228. https://doi.org/10.1075/intp.15.2.03bra

Campbell, S. and Hale, S. (2003). Translation and interpreting assessment in the context of educational measurement. In G. Anderman and M. Rogers (Eds) Translation Today: Trends and Perspectives, 205–224. Clevedon: Multilingual Matters.

Chang, C-C. and Wu, M. M-C. (2014). Non-native English at international conferences: Perspectives from Chinese-English conference interpreters in Taiwan. Interpreting 16 (2), 169–190. https://doi.org/10.1075/intp.16.2.02cha

Cheung, A. K. F. (2007). The effectiveness of summary training in consecutive interpreting (CI) delivery. Forum 5 (2), 1–23. https://doi.org/10.1075/forum.5.2.01che

Cheung, A. K. F. (2014). Anglicized numerical denominations as a coping tactic for simultaneous interpreting from English into Mandarin Chinese: An experimental study. Forum 12 (1), 1–22. https://doi.org/10.1075/forum.12.1.01che

Cohen, J. (1960). A coefficient for agreement for nominal scales. Educational and Psychological Measurement 20, 37–46. https://doi.org/10.1177/001316446002000104

Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern Test Theory. Orlando, FL: Harcourt Brace Jovanovich.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334. https://doi.org/10.1007/BF02310555

Davitti, E. (2013). Dialogue interpreting as intercultural mediation: Interpreters’ use of upgrading moves in parent-teacher meetings. Interpreting 15 (2), 168–199. https://doi.org/10.1075/intp.15.2.02dav

Feldt, L. S. and Brennan, R. L. (1989). Reliability. In R. L. Linn (Eds) Educational Measurement (3rd ed.), 127–44. New York: Macmillan.

Fleenor, J. W., Fleenor, J. B. and Grossnickle, W. F. (1996). Interrater reliability and agreement of performance ratings: A methodological comparison. Journal of Business and Psychology 10 (2), 367–380. https://doi.org/10.1007/BF02249609

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–382. https://doi.org/10.1037/h0031619

Frick, T. and Semmel, M. I. (1978). Observer agreement and reliabilities of classroom observational measures. Review of Educational Research 48, 157–184. https://doi.org/10.3102/00346543048001157

Geertz, C. (1973). The Interpretation of Cultures. New York: Basic Books.

Gile, D. (1994). Opening up in interpretation studies. In M. Snell-Hornby, F. Pöchhacker and K. Kaindl (Eds) Translation Studies: An interdiscipline, 149–158. Amsterdam: John Benjamin. https://doi.org/10.1075/btl.2.20gil

Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science 5 (1), 13–34. https://doi.org/10.1207/S15327841MPEE0501_2

Gwet, K. L. (2013). Handbook of Inter-rater Reliability (3rd ed.). Gaithersburg, MD: Advanced Analytics, LLC.

Hale, S., Garcia, I., Hlavac, J., Kim, M., Lai, M., Turner, B., and Slatyer, H. (2012). Development of a conceptual overview for a new model for NAATI standards, testing and assessment. Retrieved on 22 May 2015 from http://www.naati.com.au/PDF/INT/INTFinalReport.pdf

Hale, S. and Napier, J. (2013). Research Methods in Interpreting: A Practical Resource. London and New York: Bloomsbury.

Han, C. (2015). Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting 17 (2), 255–283. https://doi.org/10.1075/intp.17.2.05han

Han, C. (2016). Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly 13 (3), 186–201. https://doi.org/10.1080/15434303.2016.1211132

Hayes, A. F. (2005). Statistical Methods for Communication Science. Mahwah, NJ: Lawrence Erlbaum.

James, J. R. and Gabriel, K. I. (2012). Student interpreters show encoding and recall differences from information in English and American Sign Language. Translation and Interpreting Research 4 (1), 21–37.

Johnson, R. B. and Turner, L. S. (2003). Data collection strategies in mixed methods research. In A. Tashakkori and C. Teddlie (Eds) Handbook of Mixed Methods in Social and Behavioral Research, 297–319. Thousand Oaks, CA: SAGE.

Keselman, O., Cederborg, A-C., and Linell, P. (2010). ‘That is not necessary for you to know!’ Negotiation of participation status of unaccompanied children in interpreter-mediated asylum hearings. Interpreting 12 (1), 83–104. https://doi.org/10.1075/intp.12.1.04kes

Kozlowski, S. and Hattnip, K. (1992). A disagreement about within group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology 77, 161–167. https://doi.org/10.1037/0021-9010.77.2.161

Lee, J. (2008). Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer 2 (2), 165–184. https://doi.org/10.1080/1750399X.2008.10798772

Lee, S-B. (2015). Developing an analytic scale for assessing undergraduate students’ consecutive interpreting performances. Interpreting 17 (2), 226–254. https://doi.org/10.1075/intp.17.2.04lee

Lin, I. I., Chang, F. A., and Kuo, F. (2013). The impact of non-native accented English on rendition accuracy in simultaneous interpreting. Translation & Interpreting Research 5 (2), 30–44. https://doi.org/10.12807/ti.105202.2013.a03

Liu, M-H. (2013). Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari and R. van Deemter (Eds) Assessment Issues in Language Translation and Interpreting, 163–178. Frankfurt: Peter Lang.

Liu, M-H. and Chiu, Y-H. (2009). Assessing source material di?culty for consecutive interpreting: Quanti?able measures and holistic judgment. Interpreting 11 (2), 244–266. https://doi.org/10.1075/intp.11.2.07liu

Liu, M-H., Chang, C-C. and Wu, S-C. (2008). Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review 1 (1), 1–42.

Liu, M-H., Schallert, D. L., and Carroll, P. J. (2004). Working memory and expertise in simultaneous interpreting. Interpreting 6 (1), 19–42. https://doi.org/10.1075/intp.6.1.04liu

McDermid, C. (2014) Cohesion in English to ASL simultaneous interpreting. Translation and Interpreting Research 6 (1), 76–101. ttps://doi.org/10.12807/ti.106201.2014.a05

Multon, K. D. (2010). Interrater reliability. In N. J. Salkind (Ed.) Encyclopedia of Research Design, 627–629. Thousand Oaks, CA: SAGE.

Napier, J. (2004). Interpreting omissions: A new perspective. Interpreting 6 (2), 117–142. https://doi.org/10.1075/intp.6.2.02nap

Peng, G. (2009). Using Rhetorical Structure Theory (RST) to describe the development of coherence in interpreting trainees. Interpreting 11 (2), 216–243. https://doi.org/10.1075/intp.11.2.06pen

Pöchhacker, F. (2011) Research interpreting: Approaches to inquiry. In B. Nicodemus and L. Sabey (Eds) Advances in Interpreting Research, 5–25. Amsterdam: John Benjamins.

Pradas Macías, M. (2006). Probing quality criteria in simultaneous interpreting: The role of silent pauses in fluency. Interpreting 8 (1), 25–43. https://doi.org/10.1075/intp.8.1.03pra

Reithofer, K. (2013). Comparing modes of communication: The effect of English as a lingua franca vs. interpreting. Interpreting 15 (1), 48–73. https://doi.org/10.1075/intp.15.1.03rei

Rosiers, A., Eyckmans, J., and Bauwens, D. (2011). A story of attitudes and aptitudes? Investigating individual difference variables within the context of interpreting. Interpreting 13 (1), 53–69. https://doi.org/10.1075/intp.13.1.04ros

Rovira-Esteva, S. and Orero, P. (2011). A contrastive analysis of the main benchmarking tools for research assessment in translation and interpreting: The Spanish approach. Perspectives 19 (3), 233–251. https://doi.org/10.1080/0907676X.2011.590214

Roziner, I. and Shlesinger, M. (2010). Much ado about something remote: Stress and performance in remote interpreting. Interpreting 12 (2), 214–247. https://doi.org/10.1075/intp.12.2.05roz

Sawyer, D. B. (2004). Fundamental Aspects of Interpreter Education: Curriculum and Assessment. Amsterdam: John Benjamins. https://doi.org/10.1075/btl.47

Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19, 321–325. https://doi.org/10.1086/266577

Setton, R. and Motta, M. (2007). Syntacrobatics Quality and reformulation in simultaneous-with-text. Interpreting 9 (2), 199–230. https://doi.org/10.1075/intp.9.2.04set

Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 19 (3), 321–325. https://doi.org/10.1037/0033-2909.86.2.420

Shlesigner, M. (2009). Crossing the divide: What researchers and practitioners can learn from one another. Translation and Interpreting Research 1 (1), 1–16.

Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research and Evaluation 9 (4). Retrieved 30 May 2014 from http://pareonline.net/getvn.asp?v=9&n=4

Stemler, S. E. and Tsai, J. (2008). Best practices in estimating interrater reliability:Three common approaches. In J. Osborne (Ed.) Best Practices in Quantitative Methods, 29–49. Thousand Oaks, CA: SAGE. https://doi.org/10.4135/9781412995627.d5

Teddlie, C. and Tashakkori, A. (2009). The Foundations of Mixed Methods Research: Integrating Quantitative and Qualitative Techniques in the Social and Behavioral Sciences (2nd ed.). Thousand Oaks, CA: SAGE.

Thompson, B. and Snyder, P. A. (1998). Statistical significance and reliability analysis in recent JCD research article. Journal of Counseling and Development 76, 436–441. https://doi.org/10.1002/j.1556-6676.1998.tb02702.x

Tinsley, H. E. A. and Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgements. Journal of Counseling Psychology 22 (4), 358–376. https://doi.org/10.1037/h0076640

Tiselius, E. (2009). Revisiting Carroll’s scales. In C. Angelelli and H. E. Jacobson (Eds) Testing and Assessment in Translation and Interpreting Studies, 95–121. Amsterdam: John Benjamins. https://doi.org/10.1075/ata.xiv.07tis

von Eye, A. and Mun, E. Y. (2004). Analyzing Rater Agreement: Manifest Variable Methods. Mahwah, NJ: Lawrence Erlbaum.

Wu, S. C. (2013). How do we assess students in the interpreting examinations. In D. Tsagari and R. van Deemter (Eds) Assessment Issues in Language Translation and Interpreting, 15–33. Frankfurt: Peter Lang.

Yan, J-X., Pan, J., Wu, H., and Wang, Y. (2013). Mapping Interpreting Studies: The state of the field based on a database of nine major Translation and Interpreting journals (2000–2010). Perspectives 21 (3), 446–73. https://doi.org/10.1080/0907676X.2012.746379

Zheng, B-H. and Xiang, X. (2014). The impact of cultural background knowledge in the processing of metaphorical expressions: An empirical study of English-Chinese sight translation. Translation and Interpreting Studies 9 (1), 5–24. https://doi.org/10.1075/tis.9.1.01zhe

Zuo, J. (2014). Image schemata and visualization in simultaneous interpreting training. The Interpreter and Translator Trainer 8 (2), 204–216. https://doi.org/10.1080/1750399X.2014.908553

Published

2016-12-30

Issue

Section

Articles

How to Cite

Han, C. (2016). Reporting practices of rater reliability in interpreting research: A mixed-methods review of 14 journals (2004–2014). Journal of Research Design and Statistics in Linguistics and Communication Science, 3(1), 49-75. https://doi.org/10.1558/jrds.29622