hypothesis testing for least squares fitting

llee

New Member
#1
I have a spectrum that is composed of several things. I do least squares fitting in order to determine the amount of each component. For example: Spectrum = c1 *x1 + c2*x2+c3*x3+c4*x4+c5*x5. The c's are my percentages that I get through least squares fitting and the x's are the spectra of each individual component. Now, I add a 6th parameter, x6 and want to know if it is in my spectrum. I do linear least squares fitting again, but have 6 variables it is fitting. The fit will improve simply b/c I allow it more freedom. What I want to know is, is the 2nd model with the 6th component statistically significant or can I reject it because it only improved the fit b/c it was another degree of freedom? Essentially, I want to say that x6 is not statistically significant and I can reject it. I believe hypothesis testing is what I want. My null hypothesis is the 5 component model and my alternative is the 6 component model.
The problem is, I need an average and it doesn't make sense to do an average b/c this is an absorbance spectrum and not something centered on 0 or some other average value. Can I use the residuals to do hypothsis testing? The fit spectrum - the actual? If so, how do I do hypothesis testing on it?

Thanks,
Lisa
 
#2
Running a linear regression (using Excel, for example) is essentially the same as least square fitting. To check the significance of the variables in the model look at the t-statistic calculated as part of the regression. If the absolute value of the t-statistic for a variable is greater than 2 then it is significant. Also check the |t| of the intercept, if not significant then force it to be zero.
 
#4
Masteras - I'm curious, could you please expand on why you would want to keep the intercept in your equation if it is not significantly contributing to the result?
 

Dragan

Super Moderator
#5
Masteras - I'm curious, could you please expand on why you would want to keep the intercept in your equation if it is not significantly contributing to the result?

It depends on what one is doing.

For example, the solution of intercept term is to ensure that the mean of the predicted scores is equal to the mean of the dependent variable. This may, or may not be important.
 
#6
Got it - i.e. if you'll be using the regression equation for predictions. I guess with an insignificant intercept the prediction error would be very large.
 

llee

New Member
#7
I already have done the least squares fitting on hundreds of data. Now I want to take that data and determine for a few of them if one model is significantly better than the other. I don't use excel and would like to just do this with what I have already calculated and fit using a mathematical expression. I've been looking at the F-test on wikipedia: http://en.wikipedia.org/wiki/F-test. I already have residuals, so I'd like to use the F=((RSS1-RSS2)/(p2-p1))/(RSS2/(n-p2)) but I'm not sure if I'm doing this correctly. I have 2 models, one with 5 components and 1 with 6. Each spectrum has 229 points in it. RSS is residual sum of squares. So, do I take the fit spectrum - the actual to get residuals, then square each of those residuals (229 of them) and then add them all together. I do this for both model 1 and 2 to get RSS1 and RSS2. p1 and p2 are the parameters in the model, so I think this would be 5 and 6. n is the # of data points, so I believe this is 229. If I'm doing this right, I did it for 2 different experiments and here are the results:
F = ((4.5248e-5-1.3452e-5)/(1))/(1.3452e-5/(229-6))=527.1135
2nd one:
F=((1.3288e-5-1.3197e-5)/1)/(1.3197e-5/(229-6))=1.537697

I then go to an F-table and look for F(1,223). I used this one: http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm for 5% significance.
It didn't go up to 223, but its not changing much at high #s, so I used 100. THe critical value is 3.936.

So, in the first F-test, can I say the 2nd model is significant with 95% confidence and for the 2nd F-test I calculated I cannot say it is statistically significant and must use the 1st model with only 5 components?

Lisa
 

llee

New Member
#9
Thanks, I looked up stuff on nested models and it looks like I calculated the F values correctly. Now I need help with how I state the conclusions.
Case 1: F = 527 for F(1,223). I used this table: http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm and for 1% significance (F(1,100)), F critical is 6.9. My F is WAY bigger, so do I say I reject model 1 with 99% significance or a 99% confidence interval? How do I state the conclusion?
Case 2: F = 3.85 for F(1,223). THis is lower than the F crit for 5% significance but not 10%. What do I state here? I cannot reject model 1 with ??? significance?

Please help! I'm trying to wrap up my thesis.

Thanks,
Lisa