Mean of the residuals in Simple Linear Regression *Beginner Alert*

#1
Hey guys,

First of all, I would like to apologize for the question I'm gonna ask regarding the fact, that I am absolute beginner in statistics.

Currently I'm trying to test some hypotheses in Rstudio using Simple Linear Regression, as I am aware that before interpreting results of my model I have to provide verification of assumptions that should be meet within the model.

Unfortunately, one of the key assumption is not very clear for me, could someone explain please?

"The mean of the residuals should be equal to zero."

I have no clue which of the residuals I should sum and then divide by number of items. I tried every possibility but it's never zero (I tried it also within other models). So basically my conclusion is that I'm doing something wrong.

I tried to plot residuals and it seems fine for me.

I would be very thankful for any advice or explanation.

Thank you very much.

RAST HDP, co mi vysiel iny vysledok ako vtedy.PNG Rplot.png Rplot03.png
 

Dason

Ambassador to the humans
#2
If you take the mean of the residuals for the observations that were used in the model you should get 0. Here is an example using R since that's what you're using.

Code:
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> o <- lm(mpg ~ wt, data = mtcars)
> head(resi

resid         residuals     residuals.glm residuals.lm  resizeImage  
> head(residuals(o))
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
       -2.2826106        -0.9197704        -2.0859521         1.2973499 
Hornet Sportabout           Valiant 
       -0.2001440        -0.6932545 
> mean(residuals(o))
[1] 2.024748e-16
Note that the final line might look like it's non-zero but... that's zero. When using a computer there is rounding error and the value being displayed as the mean of the residuals is... 0.0000000000000002024748. Which is close to the smallest allowable value when working with floating point data...
 
#3
If you take the mean of the residuals for the observations that were used in the model you should get 0. Here is an example using R since that's what you're using.

Code:
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> o <- lm(mpg ~ wt, data = mtcars)
> head(resi

resid         residuals     residuals.glm residuals.lm  resizeImage 
> head(residuals(o))
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive
       -2.2826106        -0.9197704        -2.0859521         1.2973499
Hornet Sportabout           Valiant
       -0.2001440        -0.6932545
> mean(residuals(o))
[1] 2.024748e-16
Note that the final line might look like it's non-zero but... that's zero. When using a computer there is rounding error and the value being displayed as the mean of the residuals is... 0.0000000000000002024748. Which is close to the smallest allowable value when working with floating point data...
yes! You're right!

Thank you very much! :)

1585251291099.png
 

Dason

Ambassador to the humans
#4
Side note: there is never a need to apologize for being a beginner. We all start somewhere. Please let us know if you have other questions!
 

noetsi

Fortran must die
#5
Except dason. He was born (if that is the right word for a bot) with a perfect knowledge of statistics...

I have spent 15 years studying regression and this is the first time I have heard that one raised. Usually you look at the residuals to see if you have heteroskedacity, non-normality, or non-linearity.
 

ondansetron

TS Contributor
#6
The nonlinearity could imply the mean error is not 0.

True model: Y= B0 +B1X + B2X^2
Fit: yhat= b0 + b1x

then Y-yhat = residual
implies
E[(B0 +B1X + B2X^2)-(b0 + b1x)] = E[B2X^2] not equal to zero for B2 and X not zero... so the average error would be B2X^2... this could show as curvature in a residual plot of residuals vs x, for example.
 

noetsi

Fortran must die
#13
And mine too, but generally the idea of plotting residuals vs x is to help see if you're misspecified!
That is true, but different than the original posters question. In fact since the mean of the residuals will always be zero you don't need to check for that - it has to be true. Unless you have a software issue :p