regression with auto correlated residuals

#1
I have 2 variable X and Y. I want to build the regression model (Y is response and X is repressor) to find the relationship between X and Y. Then I will use the correlation to predict the Y in other date ( I have the x values in other date). However, when I check the assumptions, the residuals are auto correlated (The DW test is significant). Since my time interval are not equal, how can I fix the autocorrelation problem?

Time X Y
1/1/2015 216.8160 1.7820
1/5/2015 173.9320 1.6818
1/8/2015 141.4420 1.6480
1/10/2015 142.0990 1.1205
1/15/2015 202.0850 1.8014
1/19/2015 139.1050 1.5689
1/22/2015 77.3665 0.9590
1/23/2015 79.2537 1.2165
1/26/2015 94.2502 1.4657
1/29/2015 94.9671 1.0960
2/1/2015 164.8920 2.2441
2/7/2015 92.8841 0.9361
2/9/2015 95.3771 1.0646
2/12/2015 190.7650 1.7913
2/16/2015 190.8410 2.1200
2/19/2015 223.5520 2.3255
2/22/2015 229.6450 2.6472
2/23/2015 232.7760 2.5560
2/28/2015 219.4150 1.8659
3/1/2015 199.8310 2.2401
3/3/2015 269.8340 3.1491
3/4/2015 269.6200 2.8203
3/7/2015 193.1360 2.1562
3/11/2015 171.4820 2.0335
3/13/2015 188.1430 1.9166
3/14/2015 195.8700 1.7747
3/17/2015 189.3370 2.2283
3/20/2015 237.6840 1.9799
3/24/2015 103.8340 1.4352
3/27/2015 149.0290 1.4497
3/28/2015 128.2730 1.2838
3/30/2015 144.5200 1.3755
4/4/2015 158.3590 1.4340
4/6/2015 172.1230 1.6321
4/11/2015 185.7660 1.9489
4/13/2015 111.8360 1.4299
4/17/2015 173.1580 1.6974
4/24/2015 178.2280 1.9009
4/27/2015 178.5110 1.9542
5/1/2015 220.4470 2.0083
5/3/2015 255.5630 2.8687
5/7/2015 285.2660 2.9606
5/10/2015 279.3760 3.1313
5/14/2015 191.5120 2.3165
5/17/2015 244.7750 2.6072
5/21/2015 211.1450 2.2844
5/24/2015 245.5010 2.5445
5/27/2015 196.5840 2.4106
5/31/2015 210.8370 2.4096
6/3/2015 216.5690 2.1885
6/10/2015 230.0750 2.5330
6/13/2015 223.1900 2.3103
6/17/2015 237.6550 2.6229
6/20/2015 185.1380 2.1995
6/24/2015 240.9690 2.4780
6/27/2015 302.6860 3.4597
7/1/2015 291.2400 2.9420
7/4/2015 281.0320 2.8025
7/9/2015 262.3420 2.8689
7/12/2015 270.4160 2.7998
7/16/2015 270.1910 3.2023
7/19/2015 264.7230 3.5509
7/23/2015 208.0600 2.2068
7/26/2015 180.6280 2.6697
7/30/2015 173.5710 2.4105
8/2/2015 153.4680 2.0625
8/6/2015 80.3503 1.7938
8/9/2015 98.3243 1.9198
8/12/2015 92.3371 1.9848
8/16/2015 100.4340 1.7312
8/19/2015 89.6807 1.7338
8/14/2015 87.7287 1.6029
8/18/2015 89.8975 1.7404
8/20/2015 86.2575 1.4491
8/24/2015 72.0325 1.8491
8/27/2015 75.3284 1.9924
9/1/2015 70.2810 1.3430
9/4/2015 72.0742 1.2466
9/7/2015 88.4218 1.6554
9/10/2015 78.3439 1.0553
9/15/2015 81.8243 1.2442
9/17/2015 74.1898 1.0714
9/21/2015 84.2758 1.4913
9/22/2015 97.1235 1.7594
9/24/2015 65.7313 1.4645
9/27/2015 107.0440 1.8621
9/30/2015 87.4581 1.3534
10/3/2015 75.7782 2.6487
10/13/2015 92.1821 1.1669
10/18/2015 173.7270 1.8870
10/21/2015 127.9350 1.5006
10/24/2015 83.9097 1.3140
10/27/2015 74.4047 1.2053
10/30/2015 81.5380 1.1491
11/2/2015 78.1676 1.3069
11/6/2015 127.1860 2.6453
11/9/2015 123.4270 1.5429
11/14/2015 139.0320 1.5773
11/16/2015 136.1360 1.4705
11/20/2015 147.5160 1.6545
11/24/2015 130.9240 2.9669
11/27/2015 115.3600 1.4209
11/30/2015 160.7920 1.5144
12/4/2015 148.2550 1.4065
12/8/2015 117.0590 1.4141
12/21/2015 140.8640 1.6739
12/22/2015 107.2530 1.4851
12/24/2015 124.3800 1.5657
12/26/2015 127.9850 1.5632
12/29/2015 120.9600 1.5780
1/3/2016 106.0270 1.1630
1/4/2016 107.2760 1.4825
1/7/2016 128.3630 1.2191
1/8/2016 123.9610 1.3272

regression
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
Not my area, but can you thin the data to find a consistent time gap between measures? Or use some type of unstructured covariance structure?
 

katxt

Active Member
#4
If this is a practical problem, how much difference will there be if you ignore the autocorrelation and just regress the y onto the x and then use the formula and se residuals?
 
#5
Not my area, but can you thin the data to find a consistent time gap between measures? Or use some type of unstructured covariance structure?

Thanks for your advice. Can you give some more detailed advice on using unstructured covariance structure? I don't quite understand that.
 
#6
DW is for lag 1, but you have lags of 1 through 5 days. I'm not sure what this means in practice.

The whole data set has a consistent time gap 1 hour, I have X value for all the data point, but I don't have the Y value for all of the point, only has Y value for the Time I listed above. I want to find the relation between X and Y using the data above and use the relation to predict the Y value for the times that missing Y. That's why I don't have consistent time gap here.
 
Last edited:
#7
If this is a practical problem, how much difference will there be if you ignore the autocorrelation and just regress the y onto the x and then use the formula and se residuals?

I don't know how much the difference will be. But the auto correlated residuals violate the regression assumption. I am not sure how can I handle this.
 

katxt

Active Member
#8
The violations are usually dealt with by a time series analysis which includes past values into the regression, but you really need consistent time intervals for that.
If what you want to do is predict the unknown y's then perhaps the best you can do is ignore the autocorrelation, and the predictions you get may well be good enough for your needs. Even if you find some way to allow for the autocorrelation, the predictions probably won't be noticeably more reliable.
 

noetsi

Fortran must die
#9
I am not sure what you are trying to do. There are many solutions to autocorrelation. This is the simplest, but may not solve your problems depending on what you are trying to do.

https://www.econometrics-with-r.org/15-4-hac-standard-errors.html

ARIMA would work as well but the intervals not being the same is a major issue in time series, much more so than serial autocorrelation.

Incidentally the DW test is not ideal. It only catches first order autocorrelation and has a variety of problems. There are many better test than that.
 
Last edited: