Hi
I'm building an account management scorecard with logistic regression. Some of the variables have quite large correlations, but they get selected into the same model (thus the effect of the correlation does not explain all the variance). According to Siddiqi (Credit risk scorecards) the effects of multicollinearity can be overcome by using a sufficiently large sample. My questions are:
1. Is this correct (i.e. can I ignore the correlations)?
2. How big can the correlation be to still be acceptable in the model?
3. How big is a sufficiently large sample?
Thanks a lot
I'm building an account management scorecard with logistic regression. Some of the variables have quite large correlations, but they get selected into the same model (thus the effect of the correlation does not explain all the variance). According to Siddiqi (Credit risk scorecards) the effects of multicollinearity can be overcome by using a sufficiently large sample. My questions are:
1. Is this correct (i.e. can I ignore the correlations)?
2. How big can the correlation be to still be acceptable in the model?
3. How big is a sufficiently large sample?
Thanks a lot