Collinearity of variables and linear regression

#1
Hello, I had a question about lm in R:
Many subjects deal with the effect of the correlation of explanatory variables that can be negative if the correation between the two variables is too high (multicollinearity)
Indeed this can inflate the p.value or in any case make the slope coefficients of the regressors unstable.
My question is the following:

I explain the context:
I made a multiple linear regression with 1 quantitative explained variable and 2 quantitative explanatory variables correlated to 0.57. So on R : lm (y ~ X1 + X2).

Someone would explain to me why when I switch the input order of the variables like this:
lm (y ~ X1 + X2)
lm (y ~ X2 + X1)
the estimated coefficients are the same and do not move while one would expect them to fluctuate because of the collinearity of the factors X1 and X2?

Thank you very much
 
#3
thank you for your help
Well, I think I need an explanation about the nature of the summary when making a lm on R.
When i make an analysis of variance, it is actually observed that when there is a collinearity between the explanatory variables, the order of entry of the variables in the regression model analysis is important because SCEs are calculated sequentially, so the first input variable will capture all the variance and the others less.
For once, the summary of the multiple regression model returns the slope coefficient estimates but why is not there the same kind of problem as the aov?

Thank you
 

hlsmith

Not a robit
#4
As dason noted they both are the exact same model, thus same results. Not familiar with your other function, but some times people get confused by type I and type III effects. Post code and images of output if you want us to better understand you concerns.

PS, People get worked up about collinearity, but it is a natural phenomenon, not a bad thing.
 
#5
Ok, i'll try to explain it better:
I created one data frame with two explicatives variables with multicollinearity between them:
Here is the result of the analyse of variance of the multiple regression model made with firt Y~X1+X2 and second Y~X2+X1
Here are the results:
1520087912569.png
AS you can see, de SSQ change, this is beacause aov on R do a analyse type I.
But when I do the same thing but just in summary on lm such like this:
1520088007799.png
There is no change.
In fact i'm wondering why in case of analyse of variance, the order of variables is important and can lead to change the values, p.values etc and not is the case of just a regression model?
 
Last edited:
#6
As far as I can see from the summary( ) output it doesn't show the aov table. What you are seeing is the decomposition of ssr. The first explanatory variable that you input has a sums of squares associated with it (that explains some amount of variability). Each subsequent explanatory variable has a sums of squares assuming the other predictor in the model. So, SS(x1) and then SS(x2 | x1)