Theory and Practice in Language Studies, Vol 1, No 11 (2011), 1531-1540, Nov 2011
doi:10.4304/tpls.1.11.1531-1540

A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters

Farahman Farrokhi, Rajab Esfandiari

Abstract


Raters play a central role in rater-mediated assessment, and rater variability manifested in various forms including rater errors contributes to construct-irrelevant variance which can adversely affect an examinee’s test score. Halo effect as a subcomponent of rater errors is one of the most pervasive errors which, if not detected, can result in obscuring an examinee’s score and threatening validity and fairness of second language performance assessment. To that end, the present study is an endeavor to detect halo effect in L2 essays, using a relatively newly employed methodology, a many-facet Rasch model (MFRM) in language assessment. The participants in this study consisted of 194 raters—subdivided into self-rater, peer-rater, and teacher rater—who rated 188 essays written by 188 undergraduate Iranian English majors at two state-run universities in Iran. The collected data were rated using a 6-point analytic rating scale and were analyzed using the latest version of Facets 3.68.0 to answer the research question of the study. The results of facets analysis showed that, at group level, the raters did not exhibit any sign of halo effect, but, at individual level, all rater types displayed considerable halo effect. Further analysis revealed that rater types were unanimous about halo effect on four items and that self-rater showed more of a halo effect compared to the other two rater types.


Keywords


rater variability; rating scale; MFRM; raters; halo effect

References


Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika 43.9, 561–573.
http://dx.doi.org/10.1007/BF02293814

Bachman, L.F., B.K. Lynch, and M. Mason. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing 12.2, 238–257.
http://dx.doi.org/10.1177/026553229501200206

Baghaei, P. &. N. Amrahi. (2009). Introduction to Rasch Measurement. The Iranian EFL Journal 5, 139-154

Baghaei, P. (2009).Understanding the Rasch model. Mashhad: Mashhad Islamic Azad University Press.

Bond, T. G., & C. M. Fox. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd edn.). Mahwah, NJ: Erlbaum.

Eckes, T. (2009). Many-facet Rasch measurement. Retrieved June 1, 2011, from http://www.coe.int/t/dg4/linguistic/Source/CEF-refSupp-SectionH.pdf.

Engelhard, G. Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement 31.2, 93–112.
http://dx.doi.org/10.1111/j.1745-3984.1994.tb00436.x

Engelhard, G., Jr. 2002. Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.) Large-scale assessment programs for ALL students: Development, implementation, and analysis. Mahwah, NJ: Lawrence Erlbaum Associates, 261-287.

Fischer, G. H. (2007). Rasch models. In C. R. Rao & S. Sinharay (Eds.), Psychometrics: Handbook of statistics (Vol. 26). Amsterdam: Elsevier, 515–585

Jacobs, H. L., S. A., Zinkgraf, D. R., Wormuth, V. F., Hartfiel and J. B. Hughey. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House.

Knoch, U., Read, J., & von Randow, T. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing 12.2, 26–43.
http://dx.doi.org/10.1016/j.asw.2007.04.001

Kozaki, Y. (2004). Using GENOVA and FACETS to set multiple standards on performance assessment for certification in medical translation from Japanese into English. Language Testing, 21.1, 1–27.
http://dx.doi.org/10.1191/0265532204lt272oa

Linacre, J. M. (1993). Generalizability Theory and many-facet Rasch measurement. Paper presented at the annual meeting of the American educational research association, 1993, Atlanta.

Linacre, J. M. (2004). Optimizing rating scale effectiveness. In., E. V., Smith, Jr, & R. M. Smith. (Eds.). Introduction to Rasch model. Maple Grove, Minnesota: JAM press, 258-278.

Linacre, M. (1989/1994). Many-facet Rasch measurement. Chicago: MESA press.

Lynch, B.K. & T.F. McNamara. (1998) Using G-theory and Many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing 15.2, 158–180.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika 47.2, 149–174.
http://dx.doi.org/10.1007/BF02296272

McNamara, T. F. (2000). Language testing. Oxford, UK: Oxford University Press.

McNamara, T. F. (1996). Measuring second language performance. New York: Longman.

McNamara, T. F., and R. J. Adams. (1991). Exploring rater behavior with Rasch techniques. Paper presented at the annual testing research colloquium, March 21-23, in Princeton, NJ. (ERIC Document Reproduction Service No. ED345498).

Myford, C. M. and E. W. Wolfe. (2004a). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. In E. V. Smith and R. M Smith (Eds). Introduction to Rasch measurement. Maple Grove, MI: JAM Press, 518-574.

Myford, C. M. and E. W. Wolfe. (2004b). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. In E. V. Smith and R. M Smith eds. Introduction to Rasch measurement. Maple Grove, MI: JAM Press, 460–517.

Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press. (Original work published 1960)

Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing 25.4, 465–93.
http://dx.doi.org/10.1177/0265532208094273

Smith, Jr. E. V. (2004). Evidence for the reliability of measures and validity of measure interpretation: A Rasch measurement perspective. In E. V. Smith, Jr & R. M Smith (eds.), Introduction to Rasch model. Maple Grove, Minnesota: JAM Press, 93-122.

Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology 4, 25-29.
http://dx.doi.org/10.1037/h0071663

Wells, F. L. (1907). A statistical study of literary merit. Archives of Psychology 1, (Monograph No. 7).

Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.

Wright, B. D., J. M. Linacre, J. E. Gustafson, & P. Martin-Lof. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Retrieved June 1, 2011, from http://rasch.org/rmt/rmt83b.htm

Yorozuya, R. & J. W. Oller, Jr. (1980). Oral proficiency scales: Construct validity and the halo effect. Language Learning, 30.1, 135-153.
http://dx.doi.org/10.1111/j.1467-1770.1980.tb00155.x


Full Text: PDF


Theory and Practice in Language Studies (TPLS, ISSN 1799-2591)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.