Is it necessary to check for multicollinearity of explanatory variables in logistic regression?

#1
Checking for multicollinearity of independent variables is necessary in linear regression since multicollinearity increases the standard error, which in turn affects t-stats and p-values. But in logistic regression, p-values are based on Khi2, therefore multicollinearity has no effect on the p-value (this is at least my understanding, am I wrong?). If so: Question 1 - Why check for multicollinearity in logistic regression? Next, Question 2 - If checking for multicollinearity is necessary, then should the check be run for continuous as well as for dummy variables, or instead should multicollinearity be checked for continuous variables only? Question 3 - If the check is to be run for dummies as well, then is it OK to calculate such association coefficients as Pearson's Phi, Tschuprow's T and Cramer's V (as the dummies in question are nominal)? And finally, Question 4 - If calculating association coefficients is OK, can it be considered that there is no serious multicollinearity risk insofar as the coefficient does not exceed 60%? Or is there any alternative rule of thumb on this?
 

hlsmith

Not a robit
#2
Checking for multicollinearity of independent variables is necessary in linear regression since multicollinearity increases the standard error, which in turn affects t-stats and p-values. But in logistic regression, p-values are based on Khi2, therefore multicollinearity has no effect on the p-value (this is at least my understanding, am I wrong?).
I think you may be incorrect on multicollinear being moot.

If so: Question 1 - Why check for multicollinearity in logistic regression?

It is usually considered one of the things you are supposed to do in logistic.

Next, Question 2 - If checking for multicollinearity is necessary, then should the check be run for continuous as well as for dummy variables, or instead should multicollinearity be checked for continuous variables only?

For continuous and for ordinal categorical variable you would check for MC.

Question 3 - If the check is to be run for dummies as well, then is it OK to calculate such association coefficients as Pearson's Phi, Tschuprow's T and Cramer's V (as the dummies in question are nominal)?

Check for MC by using a linear model and running the VIF and Tolerance statitics. Ordinal categorical variables can be converted to 1,2,3,etc. You can use the linear model since you don't car about the outcomes just the MC tests, all depends what software you are using.

And finally, Question 4 - If calculating association coefficients is OK, can it be considered that there is no serious multicollinearity risk insofar as the coefficient does not exceed 60%? Or is there any alternative rule of thumb on this?

I don't understand this question.
 

ondansetron

TS Contributor
#3
@hlsmith is correct: you are mistaken that multicollinearity is not something to account for in logistic regression.

I also think for nominal variables you would check with collinearity with other variables (not necessarily dummies within the same variable being a problem, but you would expect high relation between the dummies).
 
#4
Thanks a lot. So, if I understand well, I have to check for multicollinearity in all cases, except if all my explanatory variables are nominal / categorical, correct?
 

hlsmith

Not a robit
#5
Yeah, I wouldn't imagine categorical variable would be "linear", but they could explain the same underlying phenomenon and cause sparsity in mutual null cells, probably wreaking havoc in a different way!
 

noetsi

Fortran must die
#7
Multicolinearity is similar in linear or logistic regression. However, many feel (for example John Fox) that its impact is limited in most cases even if it exists and few of the solutions offered for it work well, They may lead to issues such as biased parameters that are worse than Multicolinearity.

Basically when you have Multicolinearity you can not separate two predictors unique impact on a DV. I think you might be confusing linearity, with Multicolinearity. Logistic regression does not assume linearity for the raw levels of the predictors and response variable. It does assume Multicolinearity is not occurring. All categorical variables will have a linear relationship with the response variable. They might or might not show multicolinarity.
 

ondansetron

TS Contributor
#8
Also, multicollinearity might not be an issue even if it's present. If you're using the model for predictions, MC isn't necessarily problematic because it doesn't introduce bias into the estimation. However, if you're trying to make inferences on the beta parameters, then MC is more likely to be an issue. As @noetsi mentioned, it becomes difficult to parse out the relationship of each collinear IV with the DV. The fix you employ will depend on your end goal.