Time Series Problem

#1
Hi Everyone,

I have a time series that I am trying to analyze. There are two sequences in the series that appear to be very similar and I want to determine if that can be logically inferred, and if so, what does that tell me about the series.

It is my understanding that one must be very careful about assuming patterns in time series are not random, and that statistics provides a number of tools to rigorously make such determinations. I hope to enlist the expertise of the pros on this forum to help guide me through this process.

Since my understanding of statistics is very limited, I would like to work through this analysis in a methodical manner so that I can assimilate each step, and I would greatly appreciate your indulgence if my progress seems too plodding.

The series I am studying has over 30k points. It does not appear to be stationary -- having both trend and possible cyclic characteristics -- and it is not normally distributed.

The two sequences of the series I am interested in are only a few hundred points in length each.

1588005681766.png

1588005710515.png

The cumulative percent change from point zero (the peak) of the overlaid sequences looks like this:

1588006047678.png

Is there a correct or best way to quantify the similarities between these sequences?
 

noetsi

Fortran must die
#3
"It is my understanding that one must be very careful about assuming patterns in time series are not random, and that statistics provides a number of tools to rigorously make such determinations. I hope to enlist the expertise of the pros on this forum to help guide me through this process."

I have spent many years studying time series, although I am not an expert, and I have never heard anyone say that. :p I am not really sure what you are saying, statistics does not determine if something is random one way or the other that I am aware of nor if something is repeating in a univariate time series.

There are tests if there are autocorrelation in your data (although not MA patterns which also can occur). The most common, but not easy, way to determine if MA and AR is occurring is to look at the ACF and PACF graphs which are not formal tests for this - they are graphical in nature and rely on judgement. But this is not for the faint of heart and probably not ideal if you are new to statistics.
 

vinux

Dark Knight
#4
The characteristic of the time series variable is important. That will tell that the justification of the deterministic component, like trend seasonality and regime changes.

If you don't have any information about the data, the graph suggests nonstationarity ( integrated mostly d=1). Also, graphs suggest the possibility of cointegration.

If you are using forecast package in R, you could try auto.arima.
 

noetsi

Fortran must die
#5
Vinux given that he is new to statistics arima is a rough road :) it is a rough road for those who have studied it for years.

Is it possible to have cointegration in a univariate time series (this is one time series split into two pieces)?

One possibility, although this test was not intended to do what the original poster wants is to use a test for a structural break and see if you data after one point of time is significantly different than your data before that period. But I have never seen this used for this purpose I admit. I just made it up for this...

a chow test
https://en.wikipedia.org/wiki/Structural_break
 
#6
Thanks for the replies everyone,

I did look at cross-correlation a while back (I believe that in this case, since it is a cross-correlation of the signal with itself, it is autocorrelation). As I understood it, the purpose of autocorrelation is to identify repeating patterns or periodic signals in a series. In this case I believe that is unnecessary, since I have other very strong evidence that this pair of sequences only occurs twice in the series -- I will get to that later (although maybe I should have started there).

The problem I am trying to address here is: what is the best way to quantify the similarity or correlation between these two sequences, given that they come from a nonstationary nonnormally distributed series.

Perhaps my question was so simple that you couldn’t believe I am so naïve.

If I use the correlation function in excel, comparing sequence 1 and sequence 2, I get a result of 0.91259. Is that a reliable result, given that the series is nonstationary and not normally distributed?
 
Last edited:

vinux

Dark Knight
#7
Thanks for the replies everyone,

I did look at cross-correlation a while back (I believe that in this case, since it is a cross-correlation of the signal with itself, it is autocorrelation). As I understood it, the purpose of autocorrelation is to identify repeating patterns or periodic signals in a series. In this case I believe that is unnecessary, since I have other very strong evidence that this pair of sequences only occurs twice in the series -- I will get to that later (although maybe I should have started there).

The problem I am trying to address here is: what is the best way to quantify the similarity or correlation between these two sequences, given that they come from a nonstationary nonnormally distributed series.

Perhaps my question was so simple that you couldn’t believe I am so naïve.

If I use the correlation function in excel, comparing sequence 1 and sequence 2, I get a result of 0.91259. Is that a reliable result, given that the series is nonstationary and not normally distributed?
Cross correlation of nonstationary series is tricky. You will find a high correlation at lag zero itself. What is your objective? Is it prediction? or to understand the causal relationship?

Give a short description of the series if possible. It looks like at time 0, there is a calibration/standardisation is applied.