Multiple linear regression: partial F-test

#1
"Suppose that in a MULTIPLE linear regression analysis, it is of interest to compare a model with 3 independent variables to a model with the same response varaible and these same 3 independent variables plus 2 additional independent variables.
As more predictors are added to the model, the coefficient of multiple determination (R^2) will increase, so the model with 5 predicator variables will have a higher R^2.
The partial F-test for the coefficients of the 2 additional predictor variables (H_o: β_4=β_5=0) is equivalent to testing that the increase in R^2 is statistically signifcant."


I don't understand the bolded sentence. Why are they equivalent?

Thanks for explaining!

[also under discussion in Math Help forum]
 

Dragan

Super Moderator
#2
"Suppose that in a MULTIPLE linear regression analysis, it is of interest to compare a model with 3 independent variables to a model with the same response varaible and these same 3 independent variables plus 2 additional independent variables.
As more predictors are added to the model, the coefficient of multiple determination (R^2) will increase, so the model with 5 predicator variables will have a higher R^2.
The partial F-test for the coefficients of the 2 additional predictor variables (H_o: β_4=β_5=0) is equivalent to testing that the increase in R^2 is statistically signifcant."


I don't understand the bolded sentence. Why are they equivalent?

Thanks for explaining!
Because the F-test is based on the increase in R^2. That is,


F = [(R^2_full - R^2_reduced) / (5 -3)] / [(1 - R^_full) / (N - 5 - 1)] .

Note: R^2_full is the R^2 with 5 independent variables and R^2_reduced is the R^2 with 3 independent variables.
 
#3
Because the F-test is based on the increase in R^2. That is,


F = [(R^2_full - R^2_reduced) / (5 -3)] / [(1 - R^_full) / (N - 5 - 1)] .

Note: R^2_full is the R^2 with 5 independent variables and R^2_reduced is the R^2 with 3 independent variables.
Why?

According to my notes:
F = (extra SS/extra df) / MSE_full
where extra SS = SSE_reduced - SSE_full

The statement claims that the test of H_o: β_4 = β_5 = 0 is equivalent to testing that the increase in R^2 is statistically signifcant. What would be the equivalent null and alternative hypotheses in terms of R^2?

Thanks!
 

Dragan

Super Moderator
#4
Why?

According to my notes:
F = (extra SS/extra df) / MSE_full
where extra SS = SSE_reduced - SSE_full

The statement claims that the test of H_o: β_4 = β_5 = 0 is equivalent to testing that the increase in R^2 is statistically signifcant. What would be the equivalent null and alternative hypotheses in terms of R^2?

Thanks!

They are the same - logically equivalent...either way you will get the same F ratio.

Remember that: R^2 = SS_regression / SS_total

and 1- R^2 = SS_residual / SS_total
 

Dragan

Super Moderator
#6
But how can we see that the test H_o: β_4=β_5=0 is equivalent to testing that the increase in R^2 is statistically signifcant?

Thanks!
Here's a general way to consider this.

If we have k independent variables we can state the following null hypothesis:

Ho: B2 = B3 = ... = Bk = 0

It follows that (assuming that the error terms are normally distributed):

F = ESS/(k-1) / RSS/(N-k) follows the F distribution with k-1 and N-k df.


Note that the total number of parameters to be estimated is k, of which one is the intercept term.

Note also that ESS is the "Explained Sums of Squares" and RSS is the "Residiual Sums of Squares" and TSS is the "Total Sums of Squares".

Now watch what I do:

F = [(N-k)/(k-1)] * (ESS / RSS)

= [(N-k)/(k-1)] * [ESS / (TSS - ESS)]

= [(N-k)/(k-1)] * [(ESS / TSS) / (1-(ESS/TSS))]

= [(N-k)/(k-1)] * [R^2 / (1 - R^2)]

= R^2 / ( k -1) / (1-R^2) / (N - k).

So, now we can see how F and R^2 are related.

This also works not only with a full model (above) but also with reduced models.

F = (RSS_reduced - RSS_full) / # of additional variables / (RSS_full / (N - k)

which is equal to:

F = (R^2_full - R^2_reduced) / # of additional variables / (1-R^2_full)/(N-k)


Mkay.