Missing data satisfaction research


No cake for spunky
We have a motivation survey where say 75 percent of the individuals respond. If I understand this, this is not an issue of MAR/MNAR/MCAR. You have to make a judgement if you can generalize to the rest of the population, but this is not missing data in the sense multiple imputations would apply.

We also have customers who do not fill out all of the questions. This is missing data. My question is at what point (what percent of data missing) should you start getting concerned? Most of this data is ordinal in nature and multiple imputations works less well I am told with that than other data.

Professor Harrell did not make my day with this... :p

"Statistical software packages use casewise deletion in handling missing predictors; that is, any subject having any predictor or Y missing will be excluded from a regression analysis. Casewise deletion results in regression coefficient estimates that can be terribly biased, imprecise, or both353. First consider an example where bias is the problem. Suppose that the response is death and the predictors are age, sex, and blood pressure, and that age and sex were recorded for every subject. Suppose that blood pressure was not measured for a fraction of 0.10 of the subjects, and the most common reason for not obtaining a blood pressure was that the subject was about to die. Deletion of these very sick patients will cause a major bias (downward) in the model’s 3 intercept parameter. In general, casewise deletion will bias the estimate of the model’s intercept parameter (as well as others) when the probability of a case being incomplete is related to Y and not just to X"


TS Contributor
While 75% is an excellent response rate, you still have the potential for non-response bias there as well. Regarding the other, it does make sense, but it is something that would vary depending on what you are studying. That example illustrates a worst case scenario, but there are other where the impact would be minimal. You would have to make that determination.


Active Member
My question is at what point (what percent of data missing) should you start getting concerned?
When the proportion of missing data as compared to the whole set is sufficiently large to re-order the set of outcomes, were they to be assigned to one or another of the categorical responses.

For continuous data models, that would be as Miner said, exceeding some expression of a confidence interval or margin for error. Which would make it the new margin for error.
Last edited:


Less is more. Stay pure. Stay poor.
Do you have information on background characteristics of the population and the respondent sample that could serve as a comparison of the representativeness of the respondents?

Every problem is a missing data problem (e.g., model misspecification, information bias, selection bias, and confounding)!


No cake for spunky
Possibly, although it was would be very time consuming to gather it. Particularly given high turnover.

I think I am just going to do multiple imputation.