# How to code this

#### noetsi

I have two variables I am using for controls. One is median income in a county and the other is the unemployment rate. Each individual, each case, I am using in the regression model will have one of these values. But I am not sure how I should code them. They are interval data, money or a percent unemployment, but everyone in the same county will have the same value. It may not make sense to talk about a change in Y for a one dollar or percent change in these; it might make more sense to talk about a change among the 67 counties. On the other hand having a categorical variable with 67 levels for data that is interval does not make much sense to me either.

I am not sure how to address this in the linear regression. I thought of making each value a ratio of the statewide number, so the statewide number would be set to one and I would divide each number for the county by the State number, but I do not know if this changes anything.

#### hlsmith

First of all this seems somewhat like a multilevel model where these would be group level covariates, correct? And if so, do you have individual level variables and data ass well?

I believe you can use medians as continuous variable, just keep that in mind when making interpretations. Rates can be used as a continuous if the rate tend to be generally near 50% and not right next to one of the bounds (0%, 100%).

#### noetsi

No this is going into a linear regression. I might use a multilevel model later.

So you can use median county data for analysis of customers? That makes sense to me because economic data like this is going to be aggregate in pretty much every case I can think of. And its important.

#### hlsmith

But if you have multiple people from the same place, you get multilevel model with shared variability for those people. Do you have people from the same place? In my time series class there was an example where the series was actually medians, since the are more stable then means.

#### noetsi

You will get, as with using means for a variable, get shared variability. I don't see any real choice since a county only has median one income level.

Are you saying that using median this way will work in a multilevel model, but not linear regression? Its been more than a year since I ran multilevel models and I am not sure I would remember how quickly.

How badly will using a median this way harm the results, obviously it is a violation of independence.

#### hlsmith

Well do you have any other county level variables you are thinking about using? If so, think about using MLM, otherwise robust SE can be applied.

#### noetsi

I am using two county wide variables. Median income and average employment.

Which robust SE do you mean White or Newey West?

I have only used MLM in class so I am reluctant to try it for a major project.