I am analyzing the results of a satisfaction survey.
The research question or goal is to determine whether or not (dis-)satisfaction with performing certain activities (as measured by a 7-point satisfaction scale) on a system better predicts or explains overall satisfaction with the system (also measured on a 7-point satisfaction scale) than (dis-)satisfaction with other activities.
While each participant did provide a rating for the outcome variable (satisfaction with the system), the tricky part about the analysis is that not every participant provided a satisfaction rating for each activity (predictor). They were only asked to rate their satisfaction with an activity if they also self-reported performing that activity. This is leading to a lot of missing data.
For example, imagine the predictor variables are X Y and Z. The survey structure was essentially:
Q1. Have you used system ABC to do X? [Yes / No]
Q2. Have you used system ABC to do Y? [Yes / No]
Q3. Have you used system ABC to do Z? [Yes / No]
The participant would only receive the satisfaction (7-point scale) question for activity "X", "Y" or "Z" if they selected "Yes" to the corresponding questions above.
There were about 6 activities. Of the approximately 1900 participants, only around 150 provided a satisfaction rating for every single activity - so, with listwise deletion, only about 7% of the sample remains. In terms of the missing values, about 50% are missing. This, to my knowledge, is such a significant loss of data that techniques like multiple imputation are just not feasible -- coupled with the fact that I expect my missing data would be classified as "missing not at random".
Having said that, there are still a significant number of data points for each predictor variable - no less than 500 for each predictor - it's only that it's rare for any 1 participant to provide ratings for all of the predictors.
I feel that regression may simply be inappropriate and there may be no way to really resolve this problem if I want to include all or most predictors. However, if I'm mistaken I welcome any feedback.
What methods might be best suited for exploring how well satisfaction ratings with these activities best predicts overall satisfaction with the system given that each record may not have data for several predictors? I've read loglinear analysis may be a possible approach -- but I'm not familiar with that analysis. Alternatively, I've considered just doing basic correlations -- but this doesn't really compare the activities against one another in a model, of course.
Thanks in advance.
The research question or goal is to determine whether or not (dis-)satisfaction with performing certain activities (as measured by a 7-point satisfaction scale) on a system better predicts or explains overall satisfaction with the system (also measured on a 7-point satisfaction scale) than (dis-)satisfaction with other activities.
While each participant did provide a rating for the outcome variable (satisfaction with the system), the tricky part about the analysis is that not every participant provided a satisfaction rating for each activity (predictor). They were only asked to rate their satisfaction with an activity if they also self-reported performing that activity. This is leading to a lot of missing data.
For example, imagine the predictor variables are X Y and Z. The survey structure was essentially:
Q1. Have you used system ABC to do X? [Yes / No]
Q2. Have you used system ABC to do Y? [Yes / No]
Q3. Have you used system ABC to do Z? [Yes / No]
The participant would only receive the satisfaction (7-point scale) question for activity "X", "Y" or "Z" if they selected "Yes" to the corresponding questions above.
There were about 6 activities. Of the approximately 1900 participants, only around 150 provided a satisfaction rating for every single activity - so, with listwise deletion, only about 7% of the sample remains. In terms of the missing values, about 50% are missing. This, to my knowledge, is such a significant loss of data that techniques like multiple imputation are just not feasible -- coupled with the fact that I expect my missing data would be classified as "missing not at random".
Having said that, there are still a significant number of data points for each predictor variable - no less than 500 for each predictor - it's only that it's rare for any 1 participant to provide ratings for all of the predictors.
I feel that regression may simply be inappropriate and there may be no way to really resolve this problem if I want to include all or most predictors. However, if I'm mistaken I welcome any feedback.
What methods might be best suited for exploring how well satisfaction ratings with these activities best predicts overall satisfaction with the system given that each record may not have data for several predictors? I've read loglinear analysis may be a possible approach -- but I'm not familiar with that analysis. Alternatively, I've considered just doing basic correlations -- but this doesn't really compare the activities against one another in a model, of course.
Thanks in advance.