Question Regarding Missing Data

Hello everyone, I have several surveys completed with over 2000 participants. Some of the participants did not answer some of the questions (e.g. "Missing data"). We discussed using Median calculations to account for missing data. In addition to this, for data entry, we indicated a "6" for more than one response. In analysis, we were considering using this as "Missing data" as well. Initial thoughts were to convert the "6" responses to a "-9". However, with calculating mean scores, reverse coding, and regression, I'm wondering which is the best method to account for true missing data (items left blank) and handling user-defined missing data (more than one response). Suggestions would be most helpful.


No cake for spunky
The general consensus is that use of means, medians etc (single imputation) is a bad idea. It generates bias, reduces variation that is useful and overstates the certainty of the actual data. Multiple imputations or ( Full Information multiple likelihood if you had all continuous variables) is the reccomended solution. We have a five page thread on this topic elsewhere although it might confuse you more than help :p SPSS has a module specifically tied to missing data although I only know the SAS application so I can't comment on it.

This is that thread.


Less is more. Stay pure. Stay poor.
So you have actual survey items that were skipped and questions where responder markered more than one response (inappropriately), which you are also calling missing, correct????
Thank you both for responding. Allow me to clarify somethings as well as expand on some other things that might be useful.

The data was collected from elementary students across the course of an academic year. With young participants, it was likely that the participants may indicate more than one response or miss items.

We had several pre-tests administered, then, throughout the course of the year, post-tests were administered. I looked into the idea of multiple imputations, and even ran a merged data file of pre-test and post-test data from multiple schools (K-6th grade at 6 difference schools).

There are some items that are specific to certain schools that we did not need to ask the other schools, which makes me think running the multiple imputation would be most appropriately done prior to merging the data files. Now, with the data that was collected, should missing data be accounted for for each round of data collection or should it be calculated once all the data files are merged?