standardizing variables


Fortran must die
Gellman suggests putting variables on a roughly similar scale by subtracting the mean and dividing by two standard deviations. Even for binary predictors. I have not seen this done in practice much and wanted to ask opinions of its advantages. He suggests not doing it to predictors with two levels, at least when there are many predictors. I know some object to doing this at all.
Last edited:


TS Contributor
Does he recommend it for any kind of study, in any field?

For binary variables, the statements seem contradictory.

With kind regards



Fortran must die
He does not recommend for any specific field, his work is in social sciences I think. The statements seem contradictory to me to, but he does not see that or address it if he does.


Fortran must die
Gellman makes some interesting suggestion for regression prediction models (these are not rules of course just suggestions).

Include all input variables that, for substantive reasons, might be expected to be important in predicting the income. [We have little theory to guide us in what I run]

For inputs that have large effects, consider including their interactions as well.

We suggest the following strategy for decisions regarding whether to exclude a variable from a prediction mode.

If a predictor is not statistically significant and does not have the expected sign...consider removing it from the model
If a predictor is statistically significant and does not have the expected sign then think hard it it makes sense.

Not sure if this varies from a regression not used to predict. I know some say in testing theory you should not drop variables out of the model. Generally I don't use regression to predict. I want to know how variable x drives Y and in what direction