Use of regression models with centered variables

#1
Hi all,

I'm a little confused about how to use a regression model with centered variables. I know how centering variables works for reducing multicollearity and easier interpretation especially when there are interaction terms.

However, after a model was built with centered variables, how can I apply this model to other independent data or the population where sample is drawn. The data we are applying the model on often times have a different mean value. So should I use coefficients of non-centered variables? If not, how can I use this model on a different set of data?

Thanks!
 

Dason

Ambassador to the humans
#2
One way to do it is to apply the same centering to the new data. I'm not saying subtract the mean of the new data set. Subtract the mean of the old data set. Whatever you did to the old data do to the new data.
 
#3
Hi all,

I have a followup question. Since we do not center dummy variables, and I have some interaction terms with dummy variables, so I ended up with a severe situation of multicollinearity. Does anyone know how should I deal with this situation?

Thanks a lot!
 

Dason

Ambassador to the humans
#5
They're functionally equivalent yes. But from a computational viewpoint centering does improve numerical stability and if you're using iterated algorithms can speed up convergence.

Edit: With that said I don't really ever center anything until I run into a problem where centering could alleviate some issues.

Edit the dos: There are also situations where centering everything beforehand can get you some serious advantages. If you do things right you can remove the intercept during the fitting process and if you're doing something like model averaging this can help deal with a lot of issues.
 

spunky

Doesn't actually exist
#6
But from a computational viewpoint centering does improve numerical stability and if you're using iterated algorithms can speed up convergence.
i learnt this the **hard** way when fitting mixed models with too many random effects :(
 

Jake

Cookie Scientist
#7
I have some interaction terms with dummy variables
The biggest potential issue here is not multicollinearity (see the links CB provided), but rather making sure that you correctly interpret the dummy variables in the presence of the interaction term. The interaction term itself will have a straightforward interpretation but the individual dummy variables will mean something completely different compared to the non-interactive model, usually answering questions which are not useful or interesting.

Since we do not center dummy variables
You certainly could center the dummy variables, you know. (Assuming we're talking about two levels of each factor only here.) But we technically wouldn't call them dummy variables anymore. If the cell sizes for your dummy coded variables are equal, centering will leave you with -.5 vs +.5, which are contrast/effect codes, and perfectly sensible.