How to handle missing data?


I just finished an online study in which participants filled out several questionnaires at baseline and after 6 weeks (posttest). There were two groups a control group and an intervention group. The intervention group received online treatment and control group was a waiting list condition. Since it was an online study, we had a high dropout. The dropout was much higher in the intervention group (71% dropout). In the control group the dropout was 49%.How should I account for the missing data? Should I use multiple imputations.

Hope you can help me!


yes, it was randomized. There are no significant differences between the two groups at baseline.

Could I do an analysis on the completed cases only?
Last edited:


Fortran must die
There really are two issues here. Missing data can be handled by multiple imputations, and many other approaches depending on data type. But as I understand it when the drop out rate is different for intervention and control groups this creates additional issues for the analysis. I have not seen multiple imputations used for that although I am hardly an expert. To use multiple imputations you have to assume the missing data is MAR rather than MNAR and I wonder if that is a reasonable assumption if the drop out rate was significantly different.


Not a robit
Yes, re-compare baseline characteristics. Also, don't just focus on significant differences, look at the actual values, since this time around statistics will have lower power and a greater chance for a type II error.

So it is a good thing that the baseline covariates were balanced prior to the intervention. Now if there is a different, those differences can help explain and possibly impute these missing data. Noetsi, perhaps they can split the sample into two, intervention and non-intervention group. Now they can impute for each group then merge them back together.

One side note, you are missing ALOT of data, so imputation may still be a limited approach.


TS Contributor
In a clinical trial, one has to define beforehand how to deal with missing data.
Now we have the problem to find a sincere, unbiased approach after the missing
data pattern is known.

What could have defined beforehand? One approach could have been ITT analysis
with Last Observation Carried Forward substitution. In addition, a per-protocol
analysis with study completers could be performed (expressedly as a secondary

By the way, you should always report sample size if you describe a problem with
data analsysis.

With kind regards