DIF Analyses with Missing Data

Hi all,

I'm currently analyzing items from a bank for K-12 education. I have a separate data set for each grade/subject combination (for example, 1st grade English is a set, 2nd grade English is a set, etc). Each data set contains 400-7000(!) items.

I would love to able to run various DIF analyses, but missing data is a huge problem. Each student in the data set has answered only a fraction of the items. For example, in the 1st grade English data set, I have an N of 4000, but no one student has answered more than 50 percent of the 400 items, and the percentages of missing answers is above 90 percent. This is actually one of the more complete data sets I have. I expect the set with 7000 items to have a percentage missing in the 95 or above range.

My questions: there's absolutely no way to perform Mantel-Haenszel, IRT-based, or any other form of DIF on data with so many missings, right? Even if my program of choice (STATA) returned results (which it doesn't), that this information would be unreliable, invalid, and otherwise useless?

Also, assuming the above is the case, any suggestions for things I might try to measure different function by groups in lieu of conventional DIF, even if it's not particularly sophisticated?

Thanks in advance,

missing data is a problem... I suppose you can assume missing at random and use a boot strapping method to impute your missing data (probably what I would do). Then to correct the bias (which will happen), use a bias corrected and accelerated confidence interval.

I mean with that big of a sample size, you can assume that you have a representative sample and can freely use a bootstrapped covariance matrix to retain power (though do you really need to retain a power of roughly n=4000?)