I understand that if you have individual level data and you are trying to calculate something like income you might run a regression like

probability of incarceration = intercept + education_level +other_controls . . . + error

where

*education_level*is a categorical variable with the

*groups less than HS, HS, beyond HS*and

**we have an omitted group**

My question is, what happens when you aggregate your data up to say a state level (say you want to predict states' incarceration rates) and you have a separate variable for each education_level category that represents the proportion of individuals in that state who have a certain education level.

**Do we still have an omitted group?**

incarceration rate = intercept + percent_less_than_HS +percent_HS + percent+beyond_HS +other_controls . . . + error

I'm having trouble trying to reason through this.