F test not significant but coefficients are

#1
Dear all,

I ran a model in SAS, a linear model. My dependent variable is continuous and I had one important independent variable which is categorical, and a few covariates. I asked SAS to give me the LSMEANS for that independent variable (which has 3 levels).

I got a non significant F test, while one variable, the independent variable as it happens, was significant (see partial output attached, the deleted numbers are not important, all covariates are non significant).

To my knowledge, if at least one variable is significant, then the F test should be too. What am I missing here ?

Thank you !
 

Attachments

#2
Hi,

I can't see the coefficients' results in your attachments.
Anyway if you will remove all the insignificant variables then the model will be significant. You will get the same p-value for the T-test and for the
F-test.

But this is not the correct way. You should run iterations, remove the most insignificant variable each time until all the variables are significant.
You may find that an insignificant variable will become significant after removing other insignificant variables.

Can you attach the raw data? (if okay to put in public)
 

noetsi

Fortran must die
#3
I don't understand why your model would be not significant if one of your predictor variables is significant. Substantively that does not make sense to me. It might be that your F test has less power than your t test.
 
#4
Hi Noetsi,

I assume the model isn't significant before you exclude the insignificant variables.
When having only one variable the T and F will have the same result.
T, F, and Chi-squared are derivatives of the normal distribution.
 
#5
Hi Yenkel, Noetsi,

When you add an insignificant variable to a linear regression model with P independent variables, and n samples, the % of the explained Y variance will only get up (say R will get only up).
The reason is that the model has one more variable to "play" with and the LSM optimization can always give a "0" coefficient for this variable if it will not contribute to the optimization.

Usually, it will not result in a big change in R, since the added variable is insignificant variable.

But the Mean squared will be changed with a bigger scale.
F=MS(regression)/MS(residuals)= (SS(Reg)/p)/(SS(Res)/(n-p-1)=(SS(reg)/SS(res)* (n-p-1)/p=SS(reg)/SS(res) * [(n-1)/p - 1]

So if with assuming that there is a small change in SS(reg)/SS(res) and a bigger change in (n-1)/p - 1 then the result will be a smaller F and bigger p-value.

This is NOT a mathematical proof, but only to give you a better "feeling" of the model.
 

ondansetron

TS Contributor
#6
You shouldn't be removing variables hoping to make the F-stat larger.

In general, the model F-test should be the gate keeper. If the model F-test isn't significant, don't test the individual terms. The model, collectively, is not statistically useful for predicting the DV.

If the F-test is significant, then proceed to testing individual terms if needed.

And remember, these test statistics are for different hypotheses. The F-stat is jointly testing the group, whereas the individual t-tests are asking the question of "After accounting for the other X variables, is this X-variable useful for predicting Y?" Different questions, but also, use logic. The F-test is saying as a group, they're not useful for predicting Y at the 5% level.
 
#7
If you build a model and part of the independent variables are insignificant, then you should remove the most insignificant variable and check again.

The goal is not to make F bigger as you said but to leave your model only with significant independent variables
 

ondansetron

TS Contributor
#8
If you build a model and part of the independent variables are insignificant, then you should remove the most insignificant variable and check again.

The goal is not to make F bigger as you said but to leave your model only with significant independent variables
No, I'm afraid this isn't generally true. It is far better practice to include variables that are highly suspected (theoretically or previous research) to be related to the outcome irrespective of the p-value. Take for example, a problem in finance: if we were to run a regression of a bond issues' trading price on many variables including a proxy for market interest rates, we would get a p-value for each of these coefficients. However, a nonsignificant p-value on the market interest rate beta estimate would not indicate that you should remove it from the model; there is strong theoretical and empirical data that suggest the price of a security is related to the market interest (discount) rate.

Another, more general example is that of a categorical variable with 3 levels. Two dummy variables are included in the regression (with an intercept). It would be foolhardy to remove one of the dummy variables for this categorical variable simply because the p-value was nonsignificant.

Some older methods of variable screening use the approach (or similar) to what you describe, but this is purely as a screening method rather than model building. The researcher should still incorporate subject matter expertise into the analysis to avoid relying solely on p-values.
 
#9
Hi Ondansetron,

Surely you shouldn't bomb your model with many independent variables and give the god of statistics to decide for you.
There is not (yet) replacement for a modeling building by the expert. the statistics only support the expert, not the opposite :)

You wrote a great example when you shouldn't remove an insignificant variable when you have a strong theoretical reason to leave it in the model.
The method I described is for the variables that you are not sure if you should include in the model.
many times you suspect what variables you should include but don't really know.

What is the recommendation on the 3 levels categorical variable example?
(in case you are not sure about this variable should be included in your model. of course, it will be all or none and I can think about several options)

Ps the original question was how come the F is insignificant while the T of one variable is significant, and that what I wrote ...
 

ondansetron

TS Contributor
#10
I think remembering the purpose of each test and that they aren't asking the same question (barring special case) is helpful to avoid confusion over this situation. There are minor bits of mathematics that can show why this might occur, but it's arising from a process that isn't necessarily logical (testing individual variables when the joint test was nonsignificant).

And I think you are correct in the extent that we really need to know more about the OPs research. If all those other variables are important or strongly suspected to be, then looking past the global F-stat isn't necessarily logical. If he or she just added in a bunch of terms for the sake of adding terms (and chewed up DF without explaining "enough" variation in the DV), then it probably would make sense to use theory and subject-matter expertise to include only the ones that are justifiable (not using p-values) and then test the variable(s) of interest.
 
#11
Makes sense :)
But life is not black or white.
If you know the model 100% you may not need the statistics...

If for example you sure about 2 variables (x1,x2) and suspect for x3.
The full model with x1,x2,x3 may not be significant, but after removing x2 the model will be significant.
So in this example looking only on the F wasn't a good idea.
 

noetsi

Fortran must die
#12
When you add variables your explained variance will go up. I am not sure that is always so with p values. The F test and t test have the same result with one variable, but they are different test and I am not sure there power is identical. It is possible to have, in ANOVA, contrast test that are significant I think while the model F test is not, because of differences in power of the test.
 

noetsi

Fortran must die
#13
Whether you should remove variables depends on what you are using them for. I predict future events, like spending, in an arena where there is little to no theory. Even if there was theory I am more interested in predicting better than the slopes per se of the variable. I am not building theory, I am making business decisions (well others are with my data).
 

ondansetron

TS Contributor
#14
When you add variables your explained variance will go up. I am not sure that is always so with p values. The F test and t test have the same result with one variable, but they are different test and I am not sure there power is identical. It is possible to have, in ANOVA, contrast test that are significant I think while the model F test is not, because of differences in power of the test.
On this topic, it is also possible to a priori plan comparisons without using the F-test, but if you're using the F-test it doesn't make as much sense to do comparisons if the F-test is nonsignificant.
 
#15
When you add variables your explained variance will go up. I am not sure that is always so with p values. The F test and t test have the same result with one variable, but they are a different test and I am not sure there power is identical.
Hi Noetsi,
I gave before the calculation that shows why the explained variance will go up, and why F p-value usually will go up.

If the F-test and t-test have the same result with one variable, it says that both tests will have the same power to reject an incorrect H0 ...
 
#16
Whether you should remove variables depends on what you are using them for. I predict future events, like spending, in an arena where there is little to no theory. Even if there was theory I am more interested in predicting better than the slopes per se of the variable. I am not building theory, I am making business decisions (well others are with my data).
Thnks Noesti :)
Ondansetron gave a great example when you shouldn't remove variables. And you have a great example when you should remove variables.

Ondansetron do you still think otherwise?