stepwise regression with dummy variables

#1
Hello there,

I want to do a stepwise regression in order to find relevant predicting variables, but one of the possible predicting variables is a categorical variable with three different possible values. I recoded it into dummy variables but can I insert them into the stepwise regression as regular variables? what will it mean if only one of the dummy variables will be included in the regression?

Thanks!

Gili
 

trinker

ggplot2orBust
#2
I recoded it into dummy variables
You only need J-1 groups recoded (the last group is your comparison group).

but can I insert them into the stepwise regression as regular variables?
Yes you can insert J-1 recoded variables into the model. I think of it as separate variables, but in reality each new column is not a separate variable, it's really a block that describes the original categorical variable.

what will it mean if only one of the dummy variables will be included in the regression?
If only one is significant, then one group is significantly different than the others, meaning the entire original categorical variable is significant as well. I would include all the grouping (dummy coded) variable block into the model because they are not separate variables but a block describing the categorical variable. (other's my differ on this and if they do I'd like to hear your perspective)
 
#3
Thank you so much Trinker!

trinker said:
"If only one is significant, then one group is significantly different than the others, meaning the entire original categorical variable is significant as well. "
So I must ask another question, if the regression treats the grouping variables as separate variables and not as a block, why don't we insert the comparison variable to the regression as well? Couldn't it be that the two grouping variables will not be significant but the comparison variable will be significant?

Many Thanks
 
Last edited by a moderator:

trinker

ggplot2orBust
#4
if the regression treats the grouping variables as separate variables and not as a block, why don't we insert the comparison variable to the regression as well?
Couldn't it be that the two grouping variables will not be significant but the comparison variable will be significant?
First
Think of the grouping variables as a way of "tricking" the regression (which uses numeric data) into accepting categorical data. Regression treats the grouping variables as a collective block that describes the categorical variable.
Second
The comparison is in the model by default, though you didn't enter it in. In the beta coefficients it is what the other dummy coded variables are being compared to. So the the significance is in comparison to this "camparison" group (hence the name). The camparison group is the group with all 0 dummy coding (think of 1 and 0 as yes I belong to this group or 0-no I do not). Ths means the comparison group has all 0's or all "No I do not belong to this group". You can put J instead of J-1 groups into the model and most computer programs (I know [R] and SPSS will) will actually ignore the last group you enter. A group is not significant (what is significant is the whole categorical variable; there is a significant difference between the groups in relation to the outcome variable (DV)).

The Beta weights in this case (dummy coding) are special. They are actually mean differences (the same result you would get from an ANOVA) in comparison to that comparison group.
 
Last edited:
#5
Regression treats the grouping variables as a collective block that describes the categorical variable
Maybe it's my mistake, I entered the grouping variables together with the rest of the possible predicting variables in one step, into stepwise regression. Should I separate it into 2 different steps? If not, I didn't quite understand how the SPSS "knows" to treat the grouping variables as a collective block, without me defining it this way.
BTW, In this analysis I tried entering the J dummy variables and actually got the last variable entered in the SPSS output.

Thanks again!
 

trinker

ggplot2orBust
#6
Should I separate it into 2 different steps?
No, I forgot you're doing stepwise regression. In my research I rarely ever use stepwise regression, I tend towards hierarchical regression.
BTW, In this analysis I tried entering the J dummy variables and actually got the last variable entered in the SPSS output.
Not sure about this one. When I used to use SPSS (A while ago now) I used hierarchical regression. This usually caused me to enter the categorical variable as a block. When I played with J vs. J-1 I remember that SPSS automatically used J-1 of the dummy variables and used one as a comparison. It sounds like it's different for stepwise.

Below is a link about dummy coding in SPSS you may find useful. Also note that the "traditional" method I have described is not the only method. Some use Helmert Coding for instance, though I have never had the need myself.
http://www.ats.ucla.edu/stat/spss/webbooks/reg/chapter3/spssreg3.htm
 

heinz

New Member
#8
Dear all, I've stumbled upon this thread as I am learning more and more about linear regressions with SPSS at the moment.
I have two questions:

1)
If only one is significant, then one group is significantly different than the others, meaning the entire original categorical variable is significant as well. I would include all the grouping (dummy coded) variable block into the model because they are not separate variables but a block describing the categorical variable. (other's my differ on this and if they do I'd like to hear your perspective)
Let's say I use income classes (very low, low, medium, high) as independent variable in a forward regression.
To have "very low" as reference I enter "low", "medium" and "high" into the regression.
SPSS only enters "high" into the modell.

Does this mean that all other income classes are the reference now? Or the "very low" class as reference?

2)
Is it possible to do a forward regression with blocks? I have three blocks ("income classes" in block 1, "marital status" in block 2 and "depression" in block 3). I want to see if "depression" adds explanation to the model.

It does work in SPSS, but I am not sure if it does make sense. If helpful, I am happy to post my syntax. Problem I have is that SPSS excludes non-significant variables from the modell, thus leaving me asking if a forward regression with blocks makes sense.

Hope I could make myself clear and the problems understandable.

Cheers,
Heinz