Missing data

I don't understand this approach to missing data which is in an article I ran across.

Education information is missing for 10.3 percent of the sample. Rather than exclude such observations,
we include a dummy variable for when education information was missing.


Less is more. Stay pure. Stay poor.
Yeah, it can be suspect at times. If data is missing, say Race, then they just created another Race category called Race=missing. If the variable was missing at random, then the variable will just be a proportional composite of the other categories. If the variable was systematically missing, then you need to process that into your interpretation along with the amount of missingness.

So for example I am looking at a study right now with Race missing in patients related to additional care received. Whom may have missingness, well perhaps persons younger with fewer prior healthcare encounters, perhaps those with language barrier or ambiguous race classifications, perhaps it is just missing at random, etc.
There are some that suggest that data is rarely missing at random in the real world. What do you think of that idea.

The biggest issue I have is that when data is missing not at random a lot of the imputation methods, won't work. And it seems likely to me that a lot of data is not missing at random. People chose not to answer certain questions.


Less is more. Stay pure. Stay poor.
Yeah, MCAR seems rare unless the source is also not associated with outcome. Can think of some, perhaps new person, rain, power surge, etc. Some say it also helps if the exposure and outcome aren't collected within same cross-sectional instrument. Pretty much you hope the amount of missing news is low.

MAR is the only imputable scenario.