# Metrics to compare series

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Say I have two series that are related, fictitious example: # marriages and # divorces. So the counts in the second series is a subset of the first series. Now an exogenous boost happened that changed the counts (dampened the accumulation) in the first series and this is now seen in the second series as well.

I want to be able to say how many days behind (lagged?) the second series is compared to the first. In both series you can see the same increase in accumulative counts and the interventional influence on both series. I was thinking this could be examined with cross-correlation function (CCF), but I must not know what to look for. I am not that experienced with time series data. Any suggestions, I will likely be using R or SAS.

#### noetsi

##### Fortran must die
I don't think any normal time series tells you that. It will tell you how levels of the 2nd variable change with the first, but not when that occurs. That is the delay in impact. You can run analysis at specific lags and see if there is a significant impact (although its difficult to run a lot of lags). You have to have a theory of when the lag will occur to do this. It is not going to tell you.

Well no time series method I know will.

#### Miner

##### TS Contributor
This is a cross correlation of a specific product's sales against the Dow Jones that I did using Minitab. Look for the peak CCF and the number of lags before or after lag 0. In your example, you should see the peak to the right of lag 0.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well I didn't have much time before bed, but I quickly churned these out.

Above is the CCF and below are the series after logging them. So to the naked eye they seem lagged and there was an exogenous shock which dampened both series (potentially realized at day 20 and day 30, respectively. Any thoughts? Do series need to be differenced, etc. before creating the CCF?

Code:
library(tseries)
X1 <- c(                                   3,  8, 13,
14, 16, 17, 18, 22, 23, 27, 38, 44, 45,
68, 90, 105, 124, 145, 179, 235, 298, 336, 424, 497, 549,
614, 699, 786, 868, 946, 1048, 1145, 1270, 1388, 1510,
1587, 1710, 1899, 1995, 2141, 2332, 2513, 2902, 3159, 3641, 3748,
3924, 4445, 5092, 5476, 5868, 6376)
X2    <- c(                                0.5,  0.5,  0.5,
0.5,  0.5,  0.5,  0.5,  0.5,  0.5, 0.5,  0.5,  0.5,  0.5,
0.5,  0.5, 0.5,  1,  1,  1,  3,  3, 4, 6, 7,
9, 11, 11, 14, 22, 25, 26, 27, 29, 31,
34, 41, 43, 49, 53, 60, 64, 74, 75, 79,
83, 90, 96, 107, 112, 118, 127, 136)

ccf (X1, X2, lag = 50, correlation = TRUE, pl = TRUE)
days = seq(1:52)
plot(days, log(X1), type='l')
lines(days, log(X2))
mod1 = lm(log(Infected) ~ days)
mod1
mod2 = lm(log(Death) ~ days)
mod2
Output from the basic linear models using logged data, resembling the graphs:

> mod1

Call:
lm(formula = log(X1) ~ days)

Coefficients:
(Intercept) days
2.3026 0.1362

> mod2

Call:
lm(formula = log(X2) ~ days)

Coefficients:
(Intercept) days
-1.6928 0.1387

Any thoughts?

#### Attachments

• 11 KB Views: 1
• 12.6 KB Views: 1

#### Miner

##### TS Contributor
That appears to be the aurocorrelation function (ACF), which is done with a single time series. Try the cross correlation function (CCF), which is done between two time series.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I keep staring at this and I did it right given all the R examples for the function I can find, that is just what it calls the y-axis. Though it just visually feels wrong. Below is the output by lag value and if you run a basic cor for the two variables it says 0.9925. I also provided a scatterplot.

Autocorrelations of series ‘X’, by lag

-14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2
0.130 0.168 0.207 0.249 0.292 0.343 0.397 0.460 0.520 0.580 0.648 0.729 0.813
-1 0 1 2 3 4 5 6 7 8 9 10 11
0.900 0.993 0.918 0.847 0.780 0.712 0.644 0.584 0.526 0.471 0.415 0.359 0.298
12 13 14
0.246 0.194 0.148

@Miner any thoughts. From looking at the above plot in the prior post, I just wanted to be able to say they were related series lagged by ~ 20 days (which makes physiological sense given the phenomenon).

#### Attachments

• 19.6 KB Views: 1

#### Miner

##### TS Contributor
I don't think there is any question that they are related time series. However, I cannot find any evidence of a lag between them. The cross correlation function peaks at lag 0, so they appear to be in sync with each other.

#### noetsi

##### Fortran must die
That appears to be the aurocorrelation function (ACF), which is done with a single time series. Try the cross correlation function (CCF), which is done between two time series.
At least in the treatments I have seen of this in ARIMA to do CCF for two time series you have to first pre-whiten both including differencing.