- Thread starter naza
- Start date

1) The data generally fits a linear model, Y = B0 + B1*X1i + B2*X2i + .... + e

2) The error terms (e) are independent

3) The error terms have a mean of zero

4) The error terms have a constant variance

5) The error terms are normally distributed

Looking through the PDF quickly they have satisfied all of the assumptions.

Now looking at the model and its diagnostics, I am not totally convinced because they left out the possibility of co linearity of the coefficients. Usually when there are multiple predictors, people include the Variance Inflation Factor (VIF), which gives you an idea how how correlated they are. Co linearity is not good as it gives you over-confidence in the model. Ask to see the VIFs of the coefficients. The general rule of thumb is that if they are all under 10, then you are okay.

You should start with simply seeing if MC is an issue. It is if the slope coefficients are all not significant when the model is. If this does not occur I am not sure I would even pay attention to VIF.

I knew VIF<10 was like a rule of thumb short-cut, but never knew how to truly test for MC. Are you saying run reduced models and compare the coefficients? Example run all combinations of factors, and see if for the same estimate the model coefficients have overlapping confidence intervals? Just curious as to how to do this check for MC.

Imagine I fit a model of

y-hat = b0 + b1X1 + b2X2

where bi is the estimated beta coefficient for the i-th independent variable.

Suppose X1 and X2 are problematically collinear.

The estimate of b1 with both X1 and X2 in the model may be -5, for example, when theory tells us the coefficient should be positive. If I rerun the model without X2 and see that the coefficient on X1 is now +8, this might be evidence that the collinearity is making the estimate of b1 unstable and that collinearity may be at a problematic level if we want to make inferences on the true value and direction of a beta parameter from this model (this similarly applies to b2 and dropping X1 to see what happens to X2's coefficient, and this can extend to larger models but need not inolve all estimates in the model, only those of the collinear group/groups). Another way too see some of the impact of multicollinearity would be a a resample with replacement of size equal to the original sample size (or a random subset of the original data set) to then fit the model on to see how dramatic the change is in the coefficients for the suspected collinear variables. I wouldn't necessarily say run all combinations, though, and I wouldn't use confidence intervals in that sense.

Evaluating the severity of multicollinearity is a more involved process usually than looking just at the VIF because we need to look for something to suggest a problematic symptom (i.e. beta estimates with wrong signs when we want to make an inference on the parameter).