I have a long format, repeated measures dataset. I have 19 variables (columns) - all representing individual questions on a 19 item scale. For my main analysis, I use the sum total of these 19 items across each row in the dataset (and a GEE analysis, with the sum total of the 19 items as the dependent var, and time and treatment group as the independent vars). However there are two issues going on with missing data. (1) is the usual issue - some of the rows in my dataset are missing a data point at some of the columns in the 19 item dataset - this means that when I take the sum total across the 19 items, if one datapoint is missing, the sum total variable is missing for that row. Sometimes only a single datapoint is missing (so 1 of 19 for that row) - leading to a total loss of that rows information. I put all 19 columns/variables into SPSS's "Missing data analysis" routine, and used the EM choice to test against MCAR - and it was not MCAR (but I am not sure if it is missing at random - or non ignorable and Im not sure how to tell the difference. I did a frequency count on the sum total variable, and it says that about 13% of the rows in the dataset are missing. I would very much like to impute the missing data - but not sure how.

And point (2) with this data is that I have differential attrition between two treatment groups. SO in the active drug treatment, people tend to stay for the 9 days of inpatient treatment or thereabouts. In the placebo treatment, people tended to leave the study earlier. This means that this main outcome measure (the 19 item scale that I sum to get a sum total variable) only has say 4 days (rows) worth of data (less than 9 days), but the people who stayed for 9 days have 9 days worth of data (9 rows in the long format dataset required for a GEE analysis). I think this means the data is NOT missing at random and is therefor non ignorable. Everything I have read says this is bad - and is very difficult to fix.... I was thinking I could just insert the remaining rows (e.g. add rows 5 - 9 for those people who left the study on day 4) and do a Last observation carried forwards... but I know there are probs with this - would this be a valid approach?

But really what I want to know is, when I do my test to see if the obs are MCAR, should I have added the remaining rows of blank data for those who left the study early - and left it blank? Or should you calculate whether data is MCAR before adding xtra blank rows to the dataset to fill every person out to 9 days of data with LOCF

Sorry if not very coherent - this is a new area for me. Perhaps what it needs is a multiple imputation step on the raw database - followed by adding in the extra rows for an additional LOCF analysis??

Many thanks for your help in advance

Dave