Residuals v.s errors

#1
What is the difference between a residual and an error?

Is it wrong to say an error is the difference between the data points and a fitted line while a residual is the difference between data points and the sample mean.Please help!!
 
Last edited:

bryangoodrich

Probably A Mammal
#2
The error is a theoretical entity from the true model, like the alpha and beta coefficients.

\(Y = \alpha + \beta X + \epsilon\)

When you do a regression you are estimating these parameters with a model

\(\hat{Y} = a + bX\)

where a and b are estimates of alpha and beta, respectively. The residual (e) is the difference between the data point and the fitted line: \(e = Y - a + bX\). You will never have the error just like you'll never have the true coefficients. You have estimates, and the residual estimates the error: the variation in the relationship of Y ~ X that is not accounted for in that model. Therefore, your question is analogous to asking "what is the difference between the estimate and the true coefficient?" They are related, but they are not the same entity at all.
 
Last edited:
#3
I am somehow puzzled by the approach you used to explain the link between the residual (estimator) and the error (target parameter).
You defined an estimator instead of the target parameter. In most cases we usually define the target parameter and then link the two by saying the expected value of the estimator is equal to the target parameter.
 

bryangoodrich

Probably A Mammal
#4
Okay. I adjusted the equation above.

The true model is that Y is related to X stochastically (i.e., with some statistical error term). The OLS model is the expected value of this: \(E[Y] = \alpha + \beta X\). The error term disappears because its expectation is assumed to be 0. We estimate the alphas and betas with a and b. This fit of the model for each value of Y gives us the corresponding fitted values \(\hat{Y}\). The error terms are just \(e = Y - \hat{Y}\). The error term value \(\epsilon = Y - E[Y]\) is the vertical deviation of Y from the true regression line (the mean of Y), and is unknown. The residual is the vertical deviation of Y from the fitted (estimated) regression line. So both involve the deviation of Y from some line. The difference is that the error is a deviation of our known data from some line we can't see--the expectation of that stochastic relationship. The residual is a product of our estimation of that line.