Linear regression assumptions not met, now what?!

#1
Hi,

I tried and tried but I could not meet the assumptions for linear regression (normal, linearity, independence, etc.).

I have multiple covariates to predict a continuous dependent variable. I was hoping to do backwards stepwise regression. What are my next options? Thanks!
 

CB

Super Moderator
#2
Hi there, you might need to describe your data in a bit more detail... what are your variables? Which assumptions are violated in specific? How badly are they violated?
 
#3
Hi there, you might need to describe your data in a bit more detail... what are your variables? Which assumptions are violated in specific? How badly are they violated?
Thanks for the response CowboyBear. My variables are:

Monthly_Precipitation = Longitude latCosine Slope Elevation Aspect

All these variables are continuous.

ALL of the assumptions are violated.

Test for Normality (of residuals):
Shapiro-Wilk W 0.734113 Pr < W <0.0001

Plot of residuals vs. predicted is attached.
Histogram of residuals is attached.
QQPlot is attached.

Thanks!
 

terzi

TS Contributor
#4
Maybe it is an outlier problem

Based on these graphics, I think that there may be some outliers in your data. That is, some extreme values that don't behave like the rest. I don't know any critical test or approach to detect this kind of values, although you may want to graph your data individually to check for this. If you consider that some values are weird (and usually you must have an explanation to why this uncommon values occurred), you could remove them. You could also think in using transformed data, but this outliers may affect even with transformed values.

Now, there are other options that will let you model this data. You can use some robust regression estimations. Nevertheless, since you are violating all assumptions, you can think about using some non-parametric alternatives, such as quantile regression.

Hope this helps
 

gianmarco

TS Contributor
#5
Hi!

I agree with Terzi, about the need to use robust regression and/or non-parametric approaches.
As for robust regression you can at least see http://en.wikipedia.org/wiki/Robust_regression.

If I am right, I saw a Robust Regression module in Systat 12.

As for outliers, I again agree with Terzi about the need to justify the dropping-down of outlier(s).
I would add that there is the need to find a means to consistently flag a values(s) as outlier. I found interesting the following book (covering other issues as well): Wilcox, "Fundamentals of Modern Statistical Methods".

As for identifying outliers you may use, at least, the following 3 rather simple approaches:

1) Mean-based method: value(s) is declared outlier if
value-mean > 2*standard deviation
(this is subject to the "masking" effect)

2) Median-based method: value(s) is declared outlier if
value-median > 2*(MedianAbsoluteDeviation/0,6745)
median-based method is less subject to the problem of "masking".

3) InterQuartile method: value(s) is declared outlier if
value < 1st Quartile - 1,5*InterQuartile Range
value > 3rd Quartile + 1,5*InterQuartile Range

I have implemented these 3 methods in an Excel Template. Info at:

http://xoomer.virgilio.it/gianmarco.alberti/index_file/Page395.htm


Regards,
Gm