Which analysis represents better the "worse-ness" of my linear regression?

#1
Hello all,

If I have two sets of data points, for instance:

1 10
2 14
3 17
4 12
5 19

and

1 10
2 14
3 17
4 12
100 112

The linear regression is bad for the first set and good for the second, if you look at R^2, but it is still a bad line. I was told that standard deviation of the slope can help me, since for the second data set there is seemingly more leeway, thus the first 4 points become less represented by the regression. However, this is not the case.

I would like to ask whether there is a parameter or a statistical analysis of the data set that can give me evaluation of the quality of regression.
I do see a picture of residuals, however I am sure there is something more that can help me...

Thank you very much in advance,
Alex.
 

hlsmith

Not a robit
#2
Mean Square Error is a term that is usually beneficial. Your example seems like you are implying that there may be outliers. If so, you can look at the leverage and influence of individual data observation. You can also find values with potential issues and remove the value and examine fit. It is also beneficial to visualize your data and best fit line if you have 3-dimensional or 2-dimensional data. It comes down to what your question is?
 
#3
Thanks for the reply,
I see what you mean, however I am not talking about outliers. In my example, I increased the interval between the data points to get a "better" linear fit with a higher R^2, however it minimizes the weight of every other point with a lower X value, thus representing it badly. Thus in the case where I try to use the fit for the first and second data sets to represent values of X in the in the low range (in this case 1,2,3,4...), the second data set will have a worse prediction than the first.
I want a mathematical, statistical parameter or value that can really say that this is the case - the second fit is worse than the first.

Thanks again,
Alex.