# Interpreting dummy variables.

#### noetsi

After all these years reading regression this should be simple to do...

Impact are the regression slopes for dummy variables.

I should say that the excluded reference group here is not a good idea to me, they are less than 16 of which we have extremely few and most likely they earn very little. I can not change it, it was decided by the federal government.

That said I don't see how every dummy variable can be positive. Some have to earn less than others. Is there a way to say, I have not seen this addressed, that relative to another level one level did better? Formally you are comparing customers in the category to those not. But my audience will want us to discuss how one level did relative to the other.

What I did say [not certain this is true formally for regression]

#### hlsmith

Yeah, what you have seems fine. And yeah you are correct that when grabbing estimates from the multiple linear reg they would be for the base case for the other references, so your presentation above seems fine.

#### noetsi

I am running proc genmod, essentially OLS (the distribution is Normal and the link function is Identity). My dependent variable has two levels (0 and 1). As I understand it the slope is thus increased chance of being in one of these levels (I believe the increased chance of level 1, but I am uncertain of this in the documentation). Or decreased chance of course if the slope is negative.

With dummy variables it is the mean difference as always, but it still reflects the increased (or decreased) chance of being at one of the levels (again I assume this is level 1).

I ask this because in Proc Logistics unlike normal software SAS maximized the chance of being at level 0 not 1.

#### hlsmith

The output or log likely tells you which is the DV and IV reference groups.

Also, as mentioned before - this model would be kicking out the probability values and using the MLE.

#### noetsi

I don't understand what you mean by "this model would be kicking out the probability values."

That this shows the increased probability of being at a certain level of the DV?

#### hlsmith

You are putting a binary DV into a linear model (dist=normal). You aren't gonna get log odds out of it correct?

#### noetsi

I am, rather the federal government is its not my choice, using a model that assumes normality to predict a two level dependent variable coded 1 and 0. They are not generating a log odds, and since they are not assuming a binomial distribution they are not even running a linear probability model according to SAS.

So if the DV is 1 or 0, you are running Proc Genmod, and assuming an identify link and normality, how does one interpret the slopes. If the result is .05 for a dummy variable for a LPM you would interpret it as a 5 percent chance of being at level 1 (or zero if maximizing that). How do you interpret it when you do what I said. I have no idea...probably because no statistician ever considered doing this

While I am at it this is how the default predicts the results of one dummy variable. If I wanted to predict the impact of being at level 1 of the predictor on the DV can I just reverse the signs?