two auto-correlation questions

#1
Hello,

I'm just wondering if I really understand what auto-correlated error terms are:

Say I'm looking at time series data of the points per game scored by a basketball player.

Say he playes despite being slightly-injured during the last 10 games of the season. His points per game scored during those last 10 games are lower than during the other games of the season in which he played. They all end up being underneath the regression line/curve.

Would the 10 last data points all have error terms that are auto-correlated with one another?

another question:

Are auto-correlated error-terms always *caused* by the same thing?

If a basketball player injured his hand and then exactly at the moment when his hand was at 100% again he had a cold (yet played). Now all (or most) of those data-points should be below the regression curve. However not all of them are auto-correlated, but only those that belong to the same causation?
 
#2
Autocorrelation is the same as correlation except that it refers to two values of the same variable at different points in time. If you have time-series data, then, for instance, you can have a correlation between your data point at time x and at time y. Like regular correlations, autocorrelations are always between -1 and 1.

In your case your non random errors seem to be more of a sign that you have missed a factor in your regression analysis (possibly a binary variable that signals injured/non injured).

Looking at regression diagnostics helps you build a model. The general goal is to model all of the data you have so that you are left with only random errors.

Dave
 
#3
Hi again Dave,

Autocorrelation is the same as correlation except that it refers to two values of the same variable at different points in time. If you have time-series data, then, for instance, you can have a correlation between your data point at time x and at time y.
I must admit, I dont understand this: If you only look at 2 points in time isnt there always a correlation between the two? I think I really dont get what a correlation between only two data points is.

In order for it to be a correlation does there have to be an underlying factor that causes this auto-correlation? but if the data points were the same and there was no underlying factor that was responsible for the values of those two data points, it would not be called auto-regression? (i guess this one is kinda confusing..sorry)

I thought..say you have a regression line with a slope..and you forget to consider one factor in your regression (for time series data), auto-correlation automatically exists..and thus it exists quite frequently with time series data, because hardly ever does one know all (possible) independent variables...?
 
#4
First, let me say that you are correct, correlation between only two data points doesn't make sense (check the formula for correlation to see why). Generally, when we talk about time series and autocorrelation we talk about points relative to each other. For instance, there might be a correlation between points that are next to each other (points at time t and t+1 for all t).

It is very important to understand the difference between correlation and auto-correlation. Correlation is the relationship between two variables. Auto-correlation is the relationship between a single variable at different points in time (the root, "auto" comes from the latin "self", so this term literally means self correlation). This is the same for auto-regressive models. They are literally models in which you use previous values of the variable to predict future values.

In the example with the basketball player, your errors are not necessarily auto-correlated, but they are not random noise (what should be left after a regression model). This implies that there is an underlying factor that is not being taken into account in your model (in reality you may or may not be able to find this factor--but in the example it is clear, it is whether the player is injured or not).

As for causation, just because variables are correlated doesn't mean that they there is a causal relationship (causation is nearly impossible to prove in observational studies). For instance, there is a correlation between my height from age 5-15 and the rise in the stock market from 1980-1990 but neither one caused the other to occur.

Dave