Multiple regression,why do we want all coefficients to be significant?

#1
There is one thing about multiple regression analysis which I do not understand. Lets say your model is
\(Y_{i}=\beta_{0}+\beta_{1}*x_{i1}+\beta_{2}*x_{i2}...+b_{k}*x_{ik}+\epsilon_{i}\)
The \(\epsilon\)'s are iid normally distributed with mean 0.

Then usually the output from computer packages will show a p-value for each coefficient. This p-value is for the test if this coefficient is 0 or different from 0. My professor says that it is good if all coefficients are significantly different from zero, but why is this good?

The problem is that in the test of hypothesis, is that if the null-hypothesis is rejected, we can be "sure" that the coefficient is not 0. But if the null hypothesis is not rejected, it is not a proof that the coefficient is 0. I mean, if \(H_{0}\) is rejected, we accept \(H{a}\)?, but if \(H_{0}\) is not rejected, it is not proof that \(H_{0}\) is true?

So why do we delete a coefficient from the model if it is not significantly different from 0. We have no proof that it is 0?
 
Last edited:

noetsi

Fortran must die
#2
The simple answer is that it isn't neccessarily good at all at least in terms of the methods. Substantively you might want all the coefficients to matter (you are commonly testing a theory that says they are all important and you prefer that theory to be correct) but that has nothing to do with the method at all.

I guess it depends on what null you are testing, but the normal null in that model is that the slope is 0 so if you reject the null you know it is not zero - the opposite of what you said. This substantively means that the variable adds predictive value.

If you don't reject the null you can't be certain of anything substantively according to many authors ( although in the context of regression some argue that this reflect that this variable adds nothing substantive to the ability to predict the dependent variable).

Note that this is true for multiple linear regression. In Logistic regression the corresponding number is 1 not zero which means effectively the same thing (if the CI contains 1 you can not reject the null much as in linear regression if it contains zero you can not reject the null).
 
#3
The simple answer is that it isn't neccessarily good at all at least in terms of the methods. Substantively you might want all the coefficients to matter (you are commonly testing a theory that says they are all important and you prefer that theory to be correct) but that has nothing to do with the method at all.

I guess it depends on what null you are testing, but the normal null in that model is that the slope is 0 so if you reject the null you know it is not zero - the opposite of what you said.
This substantively means that the variable adds predictive value.

If you don't reject the null you can't be certain of anything substantively according to many authors ( although in the context of regression some argue that this reflect that this variable adds nothing substantive to the ability to predict the dependent variable).

Note that this is true for multiple linear regression. In Logistic regression the corresponding number is 1 not zero which means effectively the same thing (if the CI contains 1 you can not reject the null much as in linear regression if it contains zero you can not reject the null).
Thanks, I meant what you wrote. That if the hypothesis is rejected, the coefficient is not 0, sorry I forgot the not.

But why do some authors argue that if you do not reject the null hypothesis, then the coefficient might be zero. Is there any mathematical reasoning behind this?
 
#4
there's been a long discourse on the philosophy of statistical hypothesis testing with regards to whether it's appropriate to say we "accept" or "fail to reject" a hypothesis, its seems this may be a similar issue, that is, largely notational