# Thought experiment (standardized binary variable)

#### hlsmith

##### Not a robit
OK, I am working on a corrected LASSO logistic model, which addresses the model building dependence (i.e., variables were not declared a priori but established via the modeling process) and all candidate covariates were standardized to unify their scales before entering them into the model.

But for this question, I believe we can just call this a logistic regression question. So I standardize all candidate variables entered into the model (e.g., 3 are continuous and the rest ~ 10 are binary). We are ignoring the continuous variables going forward in this post, since their interpretation is straightforward. Now the binary variables during the standardization process likely got their value of either 1 or 0 subtracted by the mean (prevalence), then divided by a standard deviation feature. So an example standardized binary variable ends up taking the following values: -3.10 or 0.32, while another binary variable is: -0.72 or 1.38.

So, a complaint about possibly doing this, is that standardized binary variables are much more difficult to interpret. Though, it was felt necessary to unify candidate variable scales given the modeling approach. So my question/comment: when interpreting the outputted log odds converted to odds ratios, am I now just saying the odds of the outcome are XX times greater for a 1 standard deviation increase in the prevalence of the binary variable?

I will happily entertain any feedback or comments - Thanks!

Last edited:

#### GretaGarbo

##### Human
In "usual" linear regression (a multiple regression estimated with LS) it doesn't matter what scale we are using. If we change the scale from cm to dm (dividing "x" by 10) then the corresponding regression coefficient will just be 10 times larger. So that LS linear regression is scale invariant. (In contrast to PCA where scales matter.) So it doesn't matter if we standardize in LS regression.

But is LASSO regression (or ridge regression) scale invariant? I don't know. I didn't find anything in a quick search.

#### hlsmith

##### Not a robit
I believe we can get away with just calling this logistic regression (but there is a penalty). I am out of the office, but I may try tomorrow running it and logistic with different scalings. Though in LASSO etc. the variables need to have the same units, kind of like in nearest neighbors if variables are not similarly formatted certain variables have larger influence in distance calculations.

Last edited:

#### hlsmith

##### Not a robit
@GretaGarbo et al. I am revisiting this topic - to rephrase my original question: "how would you interpret a standardized binary variable's estimated odds ratio (OR) from a logistic regression model?

UPDATE: The standardization process took either the 0 or 1 value for the binary variable and minus the mean (AKA prevalence of the positive binary value) then divided that difference by the standard deviation of the binary variable: (e.g.,(0 - mean)/std) and (1 - mean)/std).

So I am construing the logistic regression output as a 1 unit increase meaning a standard deviation increase in the prevalence of the binary variable. So say an OR of 1.5 would be interpreted as:

"A standard deviation increase in the prevalence of the binary variable would have a 1.5 times greater odds of the outcome given the fitted model."

Last edited:

#### Dason

The same way you would interpret any continuous variable that has been standardized.

#### hlsmith

##### Not a robit
Thanks for some feedback @Dason

I was just looking for some feedback/confirmation. That and a standard deviation for a continuous variable is a little more palatable than thinking about a transformed binary variable.