Metrics to compare series

hlsmith

Less is more. Stay pure. Stay poor.
#1
Say I have two series that are related, fictitious example: # marriages and # divorces. So the counts in the second series is a subset of the first series. Now an exogenous boost happened that changed the counts (dampened the accumulation) in the first series and this is now seen in the second series as well.

I want to be able to say how many days behind (lagged?) the second series is compared to the first. In both series you can see the same increase in accumulative counts and the interventional influence on both series. I was thinking this could be examined with cross-correlation function (CCF), but I must not know what to look for. I am not that experienced with time series data. Any suggestions, I will likely be using R or SAS.
 

noetsi

Fortran must die
#2
I don't think any normal time series tells you that. It will tell you how levels of the 2nd variable change with the first, but not when that occurs. That is the delay in impact. You can run analysis at specific lags and see if there is a significant impact (although its difficult to run a lot of lags). You have to have a theory of when the lag will occur to do this. It is not going to tell you.

Well no time series method I know will.
 

Miner

TS Contributor
#4
This is a cross correlation of a specific product's sales against the Dow Jones that I did using Minitab. Look for the peak CCF and the number of lags before or after lag 0. In your example, you should see the peak to the right of lag 0.

1588099761658.png
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
Well I didn't have much time before bed, but I quickly churned these out.
1588131145315.png
Above is the CCF and below are the series after logging them. So to the naked eye they seem lagged and there was an exogenous shock which dampened both series (potentially realized at day 20 and day 30, respectively. Any thoughts? Do series need to be differenced, etc. before creating the CCF?
1588131235749.png
Code:
library(tseries)
X1 <- c(                                   3,  8, 13,
                                           14, 16, 17, 18, 22, 23, 27, 38, 44, 45,
                                           68, 90, 105, 124, 145, 179, 235, 298, 336, 424, 497, 549,
                                           614, 699, 786, 868, 946, 1048, 1145, 1270, 1388, 1510,
                                           1587, 1710, 1899, 1995, 2141, 2332, 2513, 2902, 3159, 3641, 3748,
                                           3924, 4445, 5092, 5476, 5868, 6376)
X2    <- c(                                0.5,  0.5,  0.5,
                                           0.5,  0.5,  0.5,  0.5,  0.5,  0.5, 0.5,  0.5,  0.5,  0.5,
                                           0.5,  0.5, 0.5,  1,  1,  1,  3,  3, 4, 6, 7,
                                           9, 11, 11, 14, 22, 25, 26, 27, 29, 31,
                                           34, 41, 43, 49, 53, 60, 64, 74, 75, 79,
                                           83, 90, 96, 107, 112, 118, 127, 136)

ccf (X1, X2, lag = 50, correlation = TRUE, pl = TRUE)
days = seq(1:52)
plot(days, log(X1), type='l')
  lines(days, log(X2))
mod1 = lm(log(Infected) ~ days)
mod1
mod2 = lm(log(Death) ~ days)
mod2
Output from the basic linear models using logged data, resembling the graphs:

> mod1

Call:
lm(formula = log(X1) ~ days)

Coefficients:
(Intercept) days
2.3026 0.1362

> mod2

Call:
lm(formula = log(X2) ~ days)

Coefficients:
(Intercept) days
-1.6928 0.1387

Any thoughts?
 

Attachments

Miner

TS Contributor
#6
That appears to be the aurocorrelation function (ACF), which is done with a single time series. Try the cross correlation function (CCF), which is done between two time series.
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
I keep staring at this and I did it right given all the R examples for the function I can find, that is just what it calls the y-axis. Though it just visually feels wrong. Below is the output by lag value and if you run a basic cor for the two variables it says 0.9925. I also provided a scatterplot.

Autocorrelations of series ‘X’, by lag

-14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2
0.130 0.168 0.207 0.249 0.292 0.343 0.397 0.460 0.520 0.580 0.648 0.729 0.813
-1 0 1 2 3 4 5 6 7 8 9 10 11
0.900 0.993 0.918 0.847 0.780 0.712 0.644 0.584 0.526 0.471 0.415 0.359 0.298
12 13 14
0.246 0.194 0.148

1588171581611.png

@Miner any thoughts. From looking at the above plot in the prior post, I just wanted to be able to say they were related series lagged by ~ 20 days (which makes physiological sense given the phenomenon).
 

Attachments

Miner

TS Contributor
#8
I don't think there is any question that they are related time series. However, I cannot find any evidence of a lag between them. The cross correlation function peaks at lag 0, so they appear to be in sync with each other.
 

noetsi

Fortran must die
#9
That appears to be the aurocorrelation function (ACF), which is done with a single time series. Try the cross correlation function (CCF), which is done between two time series.
At least in the treatments I have seen of this in ARIMA to do CCF for two time series you have to first pre-whiten both including differencing.