Help explaining why question averages don't sum to composite score average

Hi all,

For context, I've inherited a report that uses an average of averages to compute a composite score. Data was collected from a survey where three composite scores are calculated for three separate dimensions. For example, Composite Score 1 has 5 questions associated with it. Each question is on the same 7 point scale (0-6). Each respondent answers these five questions, an average respondent-level score is calculated using the respondent's mean score of the five questions. Then the average of each respondent-level average is taken (average of averages) and used as the Composite Score. Also, each question is given a score. This is the average score of each question across all respondents.

Data has been presented to our executive team in a table showing 2018 and 2019 composite scores, each question-level score, and a YoY difference column.

An executive has asked, "Why doesn't the sum of the question differences between years not equal the composite score difference between years?"

I would really appreciate help answering his question (and in the simplest of terms). I'm not entirely confident I know the answer, though I have a guess... The weight attributed to each response in the composite score is 1/(total respondents x # of questions), but the weight associated to each response in the question-level scores is 1/total respondents. So the composite score weight is "watering down" each response because the weight per response is smaller than the weight per response in the question-level score. In other words, the effect size per response in the composite score is smaller than the effect size per response in the question-level score.

I've attached a mock data set that shows the underlying data for each year and how it has been presented. The other problem with this report is that the data collection changed between 2018 and 2019. In 2018, partial completes were allowed, so the "average of averages" approach creates a difference between averaging the respondent scores (average 1) to calculate the composite score (average of averages) and simply averaging the entire set of question scores. I've also presented a mock up of this, but regardless of this issue, the question above still applies to comparing two complete sets of data (Example 1 in attachment) or a partial data set to a complete data set (Example 2 in attachment).

I can understand where he's coming from with his question. He's used to seeing financial reports where Total columns are presented (Composite score) and you naturally consider any sub-row below as a part of the Total. I think the data could have been presented in a different way to avoid this problem. Any suggestions there would also be much appreciated.

For additional context, this study is on physician burnout and uses the Maslach Burnout Inventory (MBI) assessment as its measurement of burnout. MBI calculates three dimensions (e.g., Emotional Exhaustion, Depersonalization, and Personal Accomplishment) based on answers to a 22-item survey. I also considered confounding factors (Simpson's Paradox) in the data to answer his question, but the situation is still present when I mocked up the problem with completely random numbers (using RANDBETWEEN in Excel). That's why I'm leaning toward it's a weighting issue, we're not comparing apples to oranges, and the data should have been presented in a different was because of it.

Thank you in advance!