# Regression with cointegrated time series data

#### marshmugger

##### New Member
Hello,

I am studying a predator-prey relationships using field data collected biannually over 5 years. For example, here is some abundance data:

pred,prey
23.0,0.1
31.5,0.7
18.0,1.4
26.0,2.6
25.5,3.3
36.7,5.0
52.3,20.9
38.7,11.1
47.0,13.9
43.3,13.7

I wish to estimate the numerical response (coefficient of linear regression predator ~ prey). I am using R for statistical computing:

I checked the cross-correlation, and there is no lag:

Autocorrelations of series predXprey, by lag
-3 -2 -1 0 1 2 3
0.121 0.278 0.599 0.925 0.463 0.468 0.131

Both series are autocorrelated and partial acf suggested first-order:

Autocorrelations of series pred, by lag

0 1 2 3 4 5
1.000 0.489 0.396 0.074 -0.288 -0.293

Durbin-Watson test

data: pred ~ 1
DW = 0.8386, p-value = 0.01694
alternative hypothesis: true autocorrelation is greater than 0

Autocorrelations of series prey, by lag

0 1 2 3 4 5
1.000 0.502 0.347 0.146 -0.207 -0.331

Durbin-Watson test

data: prey ~ 1
DW = 0.7899, p-value = 0.01272
alternative hypothesis: true autocorrelation is greater than 0

Here's the fitted model:

Call:
lm(formula = pred ~ prey, data = mydata)

Residuals:
Min 1Q Median 3Q Max
-7.6614 -1.6837 -0.8583 2.2727 6.8586

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.6214 2.1015 11.240 3.52e-06 ***
prey 1.4571 0.2116 6.885 0.000126 ***

Residual standard error: 4.534 on 8 degrees of freedom
Multiple R-squared: 0.8556, Adjusted R-squared: 0.8376
F-statistic: 47.4 on 1 and 8 DF, p-value: 0.0001264

There was no autocorrelation in the linear model residuals, although it is not easy to estimate autocorrelation from such short time series. I could fit a generalised least squares model with AR(1) errors, but I don't think it is required.

Autocorrelations of series model\$residuals, by lag

0 1 2 3 4 5
1.000 -0.435 0.018 -0.239 0.279 -0.067

Durbin-Watson test

data: model
DW = 2.8667, p-value = 0.09089
alternative hypothesis: true autocorrelation is less than 0

I've been reading about autocorrelation and regressions. It seems that pred and prey are cointegrated. The strong autocorrelation in the prey series is driving the strong autocorrelation in the pred series. There is no autocorrelation in the pred-prey model. The regression is not spurious.

So how do I interpret the Ordinary Least Squares (OLS) regression results?

R-squared is the same as a Pearson correlation and it must be wrong. Autocorrelated variables are not independent.

I've read that the OLS estimates for cointegrated regressions are unbiased but the t-statistic diverges at rate T^(1/2) for I(1) processes, where T is the time series length. Type I errors can result. A rough corrected t-statistic and P-value for the slope estimate is:

> 6.885/(sqrt(10))
 2.177
> 2*(1-pt(2.177228, 8))
 0.061

But this statistic surely does not precisely follow a t-distribution.

Is there a simple way to make inferences from my data? Some limitations and problems I can see:
1) short time series (T = 10)
2) fractional autocorrelation
3) deterministic time trend (at least in the first few years where there is strong prey population growth)
4) cointegration

Most of my references are from econometrics, where they have long time series.

Stephen.

Last edited:

#### noetsi

##### No cake for spunky
I spent a fair amount of time last year reading time series, although I am anything but an expert in it. Since we rarely have time series experts here (the ones we have come and go) I will make some brief comments. Generally there are two major issue with time series data. One is spurious regression tied to the dependent and independent variable both trending in time. Cointegration involves that issue [if two variables are cointegrated there are methods to address spurious regression]. The second is autoregressive error which is addressed variously by regression with autoregressive error, ARIMA etc. This influences the standard errors not the slopes (I believe spurious regression actually biases the parameters which of course is worse).

Have you actually run a test of cointegration (there are serveral of which Johansen is the one I have heard of the most)?

As I understand it there should be a theoretical reason to assume two variables are cointegrated (move together), but I am not sure how seriously this is taken. In any case you need to decide if you have cointegration with the test above.

Running a time series with less than 50 points is commonly discouraged. For example for ARIMA which I think your ran. For one thing you are unlikely to capture seasonal variation this way (what some call SARIMA models). Patterns can change over time as well (structural breaks) and running a short period of time might yield a result that is true of that period but not a larger time frame.

#### marshmugger

##### New Member
Thanks,

There are various tests for co-integration, unit roots, etc. and they seem to have low power for short time series. My situation is different from time series analysis in econometrics.

Of course, my model is valid only for a short period of time, but it's clear enough that predator numbers closely track prey numbers for this range of abundances (more than 2 orders of magnitude). At some very high prey abundance, which was not realised, I predict interference between predators (too many predators in a fixed area) and my fitted model would break.

I guess I'll have to try and track down some of the authors of some research papers that I have been reading.

#### noetsi

##### No cake for spunky
For unit root test one suggestion to address low power is to use a test that test a null of stationarity and another a null of non-stationarity. If you get the same results in both test, and both are signficant, it decreases your doubt that the results are invalid. I have not worked enough with various cointegration tests to know if some have a null of cointegration and others a null of not cointegrated. Even ignoring that there should be some theory to support cointegration in the specific case tested.