Hello all,
I understand that if you have individual level data and you are trying to calculate something like income you might run a regression like
probability of incarceration = intercept + education_level +other_controls . . . + error
where
education_level is a categorical variable with the groups less than HS, HS, beyond HS and we have an omitted group
My question is, what happens when you aggregate your data up to say a state level (say you want to predict states' incarceration rates) and you have a separate variable for each education_level category that represents the proportion of individuals in that state who have a certain education level. Do we still have an omitted group?
incarceration rate = intercept + percent_less_than_HS +percent_HS + percent+beyond_HS +other_controls . . . + error
I'm having trouble trying to reason through this.
I understand that if you have individual level data and you are trying to calculate something like income you might run a regression like
probability of incarceration = intercept + education_level +other_controls . . . + error
where
education_level is a categorical variable with the groups less than HS, HS, beyond HS and we have an omitted group
My question is, what happens when you aggregate your data up to say a state level (say you want to predict states' incarceration rates) and you have a separate variable for each education_level category that represents the proportion of individuals in that state who have a certain education level. Do we still have an omitted group?
incarceration rate = intercept + percent_less_than_HS +percent_HS + percent+beyond_HS +other_controls . . . + error
I'm having trouble trying to reason through this.