Missing value treatment for categorical & continues variables and testng MCAR/MAR

Hello everybody,

My dataset (N=90) consists of ordinal variables serving as IVs and dichotomous and continuous variables serving as DVs. IVs and DVs were assessed in two seperate experiments with the same sample. My proximal goal is to build scales from the IVs and the DVs. I am using PWSTAT 18 (SPSS) for data analysis.

Unfortunately I have missing values on every type of variable. My first step was to examine whether data values are MCAR/MAR or NMAR.
I calculated Estimation Maximization (EM) for the ordinal variables (IVs) by treating them as continuous (you have to decide if you treat data as continuous or categorical in SPSS EM) and assumed MCAR, since Little's test was not significant. My next step would be to estimate missing values with (EM). Therefore my first question: 1. Is this a correct procedure?

I wanted to use the same procedure for testing the MCAR assumption for my DVs when I noticed that SPSS does not compute estimated values or Little's test for categorical variables. So my second question is: 2.Can EM be computed for categorical variables at all?

From studying the literature about missing values treatment, I understand that multiple imputation is the state of the art. My problem is that I am only common with SPSS and SPSS won't provide pooled results for Factor Analyses or Reliability analyses which I need to form scales for. This feature has not been implemented yet. At this point, I would be happy to at least be able to count EM for my data since it is "just" for my master thesis. Listwise deletion would be the least option to choose since I would lose 10% of participants.
Therefore my third and fourth question:
3.How can I test MCAR or MAR for categorical data?
4.How can I impute values for categorical variables other than with multiple imputation?

I appreciate any help!!!
Thank you,

I have the very same problem with my data. Did you find a solution? How did you proceed? I'd appreciate you input a lot!
Have a lovely day!


New Member
I'll check on this when I get home but I seem to recall that a common way of assessing whether the data is missing at random is to run a chi-squared test of homogeneity of proportions of missing data between meaningful groups. So for instance if 10% of the data is missing overall I would expect that 10% of men and 10% of women would both be missing data, and if so then the data would be (close to) missing at random; if on the other hand 5% of men and 15% of women were missing data, then I have a problem.

As a caution to the OP, if s/he is still around, I would not trust the results of a factor analysis with N = 90, especially when data are missing. Factor analyses are ludicrously low-power.