Missing data


No cake for spunky
My great dread. I have a logistic regression There are 645 cases and data is missing in about 228. Only 6 did not answer the dependent variable the median missing for any predictor is 15 about 2 percent of the responses. The problem is there are 38 predictors and being missing on any causes you to get thrown out of the logistic regression (long ago I read about an alternative when you used all the cases even when they were missing some information on some question, but the problems raised in doing so convinced me this was too dangerous).

I am not sure what to do, I know of multiple imputation, but my understanding is that doing this with non-interval data is problematic (actually I stopped studying this because I was told that on this board years ago). :p All my predictors are dummy variables, my DV has two levels.

We are doing this to determine which variables are relatively more important, the way we do that is see which are statistically significant (I have found no good way to address relative importance with logistic regression). I am not sure what to do with so many missing cases.

Is it reasonable when you see an unusually high number of cases missing to remove a question, because you think people did not understand it, or had no answer (in honesty I think this is true with the specific question even ignoring all the missing questions - no one asked me about it when it was created)?


Less is more. Stay pure. Stay poor.
It is showing the pattern of missingness. Everywhere there is an X, that variable is available. So the first row is the scenario where all variables are present, with 97% of people having all variable data.


No cake for spunky
I generated that, but the table is too big to show here I think. It has 122 groups and 31 variables.
Last edited:


No cake for spunky
I posted it. Just looking at the raw data I don't think there is an obvious pattern. One issue I found out is that what is missing includes people who did answer the question, but answered don't know/NA. They are missing as far as SAS is concerned, but they did answer the question.


No cake for spunky
I didn't think anyone else would be interested, that never crossed my mind. :)

The DV was not generated by the code you sent me hlsmith. Was it supposed to? I have the DV, but only six people total are missing on that. Virtually every one responded to that question. The reason I think is that the missing data is not tied to people choosing not to answer, but to people saying they don't know.


Ambassador to the humans
Didn't even matter if you didn't think anybody else would be interested. It's just common courtesy. You're asking hlsmith for help - don't make them track things down that you randomly posted to a different thread.


No cake for spunky
I misunderstood you post on chat that was what I was supposed to do and that hlsmith would go there. I did not mean to be discourteous.