Journal of Language Teaching and Research, Vol 3, No 1 (2012), 84-92, Jan 2012
doi:10.4304/jltr.3.1.84-92

Differential Item Functioning: Implications for Test Validation

Mohammad Salehi, Alireza Tayebi

Abstract


This paper attempts to recapitulate the concept of validity, namely construct validity (i.e., its definition and its approaches and role in language testing and assessment). Validation process is then elaborated on and proved to be integral enterprise in the process of making tests, namely English language proficiency tests. Then come the related concept of test fairness and test bias and its sources (e.g., gender, field of study, age, nationality and L1, background knowledge, etc) and contributions and threads to the validity of tests in general and in high-stakes tests of English language proficiency in particular. Moreover, in the present study, different approaches to investigate the validity of tests will be reviewed. Differential Item Functioning (DIF), among the other methods to investigate the validity of tests is also explained along with the description and explanation of its different detection methods and approaches mentioning their advantages and disadvantages to conclude that logistic regression (LR) is among the best methods till now.


Keywords


validity; Differential Item Functioning; item bias; test fairness; logistic regression; IRT

References


Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & Braun, H. (EDs.) Test validity (p. 19-32). Hillsdale, NJ : Erbaum.

Alderson, C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. NY: CUP.

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Brown, H. D. (2004). Language assessment: Principles and classroom practices. London: Longman.

Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment. New York: McGraw-Hill.

French, A. A., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33 (3), 315-332.
http://dx.doi.org/10.1111/j.1745-3984.1996.tb00495.x

Geranpayeh, A. & Kunnan, A. J. (2007). Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4 (2), 190-222.

Jodin, M. G. & Gierl, M. J. (1999). Evaluating type I error and power using an effect size measure with the logistic regression procedure for DIF detection. University of Alberta.

Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18, 89-114.

Lai, J. S., Teresi, J., & Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation & the Health Professions. 28 (3), 283-294.
http://dx.doi.org/10.1177/0163278705278276

McNamara, T., & Roever, C. (2006). Language testing: The social dimension. New York: Blackwell publishing.

Monahan, P. O., McHorney, C. A., Stump, T. E., & Perkins, A. J. (2007). Odds ratio, delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32 (1). 92-109.
http://dx.doi.org/10.3102/1076998606298035

Pae, T. (2004). DIF for examinees with different academic backgrounds. Language Testing; 21, 53-73.
http://dx.doi.org/10.1191/0265532204lt274oa

Park, T. (2006). Detecting DIF across different language and gender groups in the MELAB essay test using the logistic regression method. Spaan Fellow Working Papers in Second or Foreign Language Assessment. 4, 81-96.

Perrone, M. (2006). Differential item functioning and item bias: Critical considerations in test fairness. Columbia University Working Papers in TESOL & Applied Linguistics. 6 (2), 1-3.

Rezaee, A., & Salehi, M. (2008). The construct validity of a language proficiency test: a multitrait multimethod approach. TELL, 2 (8), 93-110.

Salehi, M., & Rezaee, A. (2009). On the factor structure of the grammar section of university of Tehran English Proficiency Test (the UTEPT). Indian Journal of Applied Linguistics. 35 (2), 169-187.

Scherman, C. A., & Goldstein, H. W. (2008). Examining the relationship between race-based Differential Item Functioning and Item Difficulty. Educational and Psychological Measurement; 68, 537-553.
http://dx.doi.org/10.1177/0013164407310129

Sheppard, R., Han, K., Colarelli, S. M., Dai, G., & King, D. W. (2006). Differential item functioning by sex and race in the Hogan personality inventory. Assessment. 13 (4), 442-453.
http://dx.doi.org/10.1177/1073191106289031
PMid:17050914

Noortgate, W. V. D., & Boeck, P. D. (2005). Assessing and examining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics. 30 (40), 443-464.
http://dx.doi.org/10.3102/10769986030004443

Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement. 27 (4), 361-370.
http://dx.doi.org/10.1111/j.1745-3984.1990.tb00754.x

Teresi, J. (2004). Differential item functioning and health assessment. Columbia University Stroud Center and faculty of Medicine. New York State Psychiatric Institute, Research Division, Hebrw Home for the Aged at Riverdale. 1-24.

Zumbo, B. D. (1999). A Handbook on the theory and methods of Differential Item Functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (Ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.


Full Text: PDF


Journal of Language Teaching and Research (JLTR, ISSN 1798-4769)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.