# Mean of the residuals in Simple Linear Regression *Beginner Alert*

##### New Member
Hey guys,

First of all, I would like to apologize for the question I'm gonna ask regarding the fact, that I am absolute beginner in statistics.

Currently I'm trying to test some hypotheses in Rstudio using Simple Linear Regression, as I am aware that before interpreting results of my model I have to provide verification of assumptions that should be meet within the model.

Unfortunately, one of the key assumption is not very clear for me, could someone explain please?

"The mean of the residuals should be equal to zero."

I have no clue which of the residuals I should sum and then divide by number of items. I tried every possibility but it's never zero (I tried it also within other models). So basically my conclusion is that I'm doing something wrong.

I tried to plot residuals and it seems fine for me.

I would be very thankful for any advice or explanation.

Thank you very much.

#### Dason

##### Ambassador to the humans
If you take the mean of the residuals for the observations that were used in the model you should get 0. Here is an example using R since that's what you're using.

Code:
> head(mtcars)
mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> o <- lm(mpg ~ wt, data = mtcars)

resid         residuals     residuals.glm residuals.lm  resizeImage
Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive
-2.2826106        -0.9197704        -2.0859521         1.2973499
-0.2001440        -0.6932545
> mean(residuals(o))
[1] 2.024748e-16
Note that the final line might look like it's non-zero but... that's zero. When using a computer there is rounding error and the value being displayed as the mean of the residuals is... 0.0000000000000002024748. Which is close to the smallest allowable value when working with floating point data...

##### New Member
If you take the mean of the residuals for the observations that were used in the model you should get 0. Here is an example using R since that's what you're using.

Code:
> head(mtcars)
mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> o <- lm(mpg ~ wt, data = mtcars)

resid         residuals     residuals.glm residuals.lm  resizeImage
Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive
-2.2826106        -0.9197704        -2.0859521         1.2973499
-0.2001440        -0.6932545
> mean(residuals(o))
[1] 2.024748e-16
Note that the final line might look like it's non-zero but... that's zero. When using a computer there is rounding error and the value being displayed as the mean of the residuals is... 0.0000000000000002024748. Which is close to the smallest allowable value when working with floating point data...
yes! You're right!

Thank you very much!

#### Dason

##### Ambassador to the humans
Side note: there is never a need to apologize for being a beginner. We all start somewhere. Please let us know if you have other questions!

#### noetsi

##### No cake for spunky
Except dason. He was born (if that is the right word for a bot) with a perfect knowledge of statistics...

I have spent 15 years studying regression and this is the first time I have heard that one raised. Usually you look at the residuals to see if you have heteroskedacity, non-normality, or non-linearity.

#### ondansetron

##### TS Contributor
The nonlinearity could imply the mean error is not 0.

True model: Y= B0 +B1X + B2X^2
Fit: yhat= b0 + b1x

then Y-yhat = residual
implies
E[(B0 +B1X + B2X^2)-(b0 + b1x)] = E[B2X^2] not equal to zero for B2 and X not zero... so the average error would be B2X^2... this could show as curvature in a residual plot of residuals vs x, for example.

#### Dason

##### Ambassador to the humans
True. But that doesn't matter because the sample mean of the residuals will be 0 regardless.

#### ondansetron

##### TS Contributor
True. But that doesn't matter because the sample mean of the residuals will be 0 regardless.
Cause that's how the line of best fit is defined (partially), yaknow?

#### ondansetron

##### TS Contributor
I think that is his point
And mine too, but generally the idea of plotting residuals vs x is to help see if you're misspecified!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yeah, i would have been content with the outputted residual median being E-07. Good points though.

#### noetsi

##### No cake for spunky
And mine too, but generally the idea of plotting residuals vs x is to help see if you're misspecified!
That is true, but different than the original posters question. In fact since the mean of the residuals will always be zero you don't need to check for that - it has to be true. Unless you have a software issue