- Thread starter Toulouse
- Start date

Hi there, you might need to describe your data in a bit more detail... what are your variables? Which assumptions are violated in specific? How badly are they violated?

Monthly_Precipitation = Longitude latCosine Slope Elevation Aspect

All these variables are continuous.

ALL of the assumptions are violated.

Test for Normality (of residuals):

Shapiro-Wilk W 0.734113 Pr < W <0.0001

Plot of residuals vs. predicted is attached.

Histogram of residuals is attached.

QQPlot is attached.

Thanks!

Based on these graphics, I think that there may be some outliers in your data. That is, some extreme values that don't behave like the rest. I don't know any critical test or approach to detect this kind of values, although you may want to graph your data individually to check for this. If you consider that some values are weird (and usually you must have an explanation to why this uncommon values occurred), you could remove them. You could also think in using transformed data, but this outliers may affect even with transformed values.

Now, there are other options that will let you model this data. You can use some robust regression estimations. Nevertheless, since you are violating all assumptions, you can think about using some non-parametric alternatives, such as quantile regression.

Hope this helps

I agree with Terzi, about the need to use robust regression and/or non-parametric approaches.

As for robust regression you can at least see http://en.wikipedia.org/wiki/Robust_regression.

If I am right, I saw a Robust Regression module in Systat 12.

As for outliers, I again agree with Terzi about the need to justify the dropping-down of outlier(s).

I would add that there is the need to find a means to consistently flag a values(s) as outlier. I found interesting the following book (covering other issues as well): Wilcox, "Fundamentals of Modern Statistical Methods".

As for identifying outliers you may use, at least, the following 3 rather simple approaches:

1) Mean-based method: value(s) is declared outlier if

value-mean > 2*standard deviation

(this is subject to the "masking" effect)

2) Median-based method: value(s) is declared outlier if

value-median > 2*(MedianAbsoluteDeviation/0,6745)

median-based method is less subject to the problem of "masking".

3) InterQuartile method: value(s) is declared outlier if

value < 1st Quartile - 1,5*InterQuartile Range

value > 3rd Quartile + 1,5*InterQuartile Range

I have implemented these 3 methods in an Excel Template. Info at:

http://xoomer.virgilio.it/gianmarco.alberti/index_file/Page395.htm

Regards,

Gm