Statistical Equivalence Testing Approaches for Mantel-Haenszel DIF Analysis

The null hypothesis test used in differential item functioning (DIF) detection tests for a subgroup difference in item-level performance—if the null hypothesis of "no DIF" is rejected, the item is flagged for DIF. Conversely, an item is kept in the test form if there is insufficient eviden...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of educational and behavioral statistics Vol. 43; no. 4; pp. 407 - 439
Main Authors:	Casabianca, Jodi M., Lewis, Charles
Format:	Journal Article
Language:	English
Published:	Los Angeles, CA SAGE Publishing 01-08-2018 SAGE Publications American Educational Research Association
Subjects:	Achievement Tests Bayesian Statistics Educational Testing Equivalency Tests Foreign Countries Hypotheses Hypothesis Testing International Assessment Secondary School Students Simulation Statistical Analysis Student Evaluation Test Bias Canada equivalence testing loss functions differential item functioning (DIF) empirical Bayes Mantel–Haenszel
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The null hypothesis test used in differential item functioning (DIF) detection tests for a subgroup difference in item-level performance—if the null hypothesis of "no DIF" is rejected, the item is flagged for DIF. Conversely, an item is kept in the test form if there is insufficient evidence of DIF. We present frequentist and empirical Bayes approaches for implementing statistical equivalence testing for DIF using the Mantel-Haenszel (MH) DIF statistic. With these approaches, rejection of the null hypothesis of "DIF" allows the conclusion of statistical equivalence, a more stringent criterion for keeping items. In other words, the roles of the null and alternative hypotheses are interchanged in order to have positive evidence that the DIF of an item is small. A simulation study compares the equivalence testing approaches to the traditional MH DIF detection method with the Educational Testing Service classification system. We illustrate the methods with item response data from the 2012 Programme for International Student Assessment.
ISSN:	1076-9986 1935-1054
DOI:	10.3102/1076998617742410