PROC ARIMA

noetsi

Fortran must die
#1
My data is clearly non-stationary (by both graphical and formal tests). So I ran code to difference it. When you do that you should retest the differenced data to see if the results need to be differenced again. I am not sure, I can't find in the documentation, how you do that. I assume you test the residuals.

This is the code I ran to difference. I left out the code requesting the formal tests to show it was non-stationary here.

ods graphics on;
proc arima data= tsdata;
identify var=spend(1) ;
run;
ods graphics off;

I am also confused how you do a second difference if the first difference does not work (and ignoring seasonality differencing). I believe it is identify var=spend(1,1) ;
 

Dason

Ambassador to the humans
#2
VAR=variable
VAR= variable ( d1, d2, …, dk )
names the variable that contains the time series to analyze. The VAR= option is required.

A list of differencing lags can be placed in parentheses after the variable name to request that the series be differenced at these lags. For example, VAR=X(1) takes the first differences of X. VAR=X(1,1) requests that X be differenced twice, both times with lag 1, producing a second difference series, which is

[X(t) - X(t-1)] - [X(t-1) - X(t-2)] = X(t) - 2X(t-1) + X(t-2)

VAR=X(2) differences X once at lag two .

If differencing is specified, it is the differenced series that is processed by any subsequent ESTIMATE statement.
https://documentation.sas.com/?docs...ima_syntax04.htm&docsetVersion=15.1&locale=en
 

noetsi

Fortran must die
#3
I understand that dason. What I am not sure of is how you test to see if a first differencing has removed non-stationarity. I assume you have to do a test like ADF or KPSS on the differenced series. It is this I can't figure out how to do.
 

Miner

TS Contributor
#4
I am far from an expert in ARIMA, but my understanding is that you run the ACF on the differenced data. If the lag-1 ACF is </= 0, it does not need more differencing. Some guidelines: 1) stationary model - no differencing; 2) linear trend - one difference; 3) quadratic, exponential, s-curve - two differences.
 

noetsi

Fortran must die
#5
Thanks miner. I actually have some experience in that (I prefer the formal tests like KPSS and ADF). I am asking a purely coding issue. If you difference in SAS how do you request a test of the differenced series not the original series before you difference.

I think the answer is when you run code it does the test on what you request not the original data. For example if you request a difference as in the code below the ADF test is done on the differenced series not the original series before it is differenced. But I can not find confirmation of this in the documentation.

ods graphics on;
proc arima data= tsdata;
identify var=spend(1) stationarity =(adf =11);
run;
ods graphics off;
 

Dason

Ambassador to the humans
#6
I mean... Wouldn't that be easy to test. Run it one way and see what the test says. Then do a different method of differencing and see if the test changes...
 

noetsi

Fortran must die
#7
The test does change dason that is why I think what I said above. But I was hoping someone here would know for sure since I would like to be sure.

So here is the results for two differencing, the first example, and one differencing. To me the first difference does not support stationarity and the second difference does. But I thought I would ask.
 

Attachments

noetsi

Fortran must die
#8
To add to the strangeness it is not (in EG) printing any of my graphs. This produces only the default graphs when it should print many more.

ods graphics on;
proc arima data= tsdata plots=all;
identify var=spend(1,1) stationarity =(adf =11) ;
run;
ods graphics off;
quit;
 

noetsi

Fortran must die
#11
Ok this is not really a coding issue, but I did not want to open another thread.

I had data which the ADF and KPSS test suggest strongly is non-stationary. So I differenced it twice and its arguably stationary now (the ADF tau test is actually .088 when I would have preferred to be below .05, but it is not recommended to difference more than twice).

The ESACF and SCAN tests suggested p=0 and MA equal 1 or 2. I tried both models and the Box Ljung test suggests it is still showing serial correlation (SAS calls this the autocorrelation check of residuals). I tried adding an AR term, but the model fails to converge when I do so (something I have never run into with ARIMA). I post the diagnostics below for p=0 q=2: (this is a (0,2,2) model. Our data is seasonal, but nothing in the PACF or ACF suggests this to me.

1584574378603.png
 

noetsi

Fortran must die
#12
It is strange. The model will converge when I set p to 2 (two ARM terms) and q to 2 (2 MA terms), but it won't when I set p to 1 and q to 2.

None of these models deals with the problem with the Box Ljung test noted above.
 

hlsmith

Less is more. Stay pure. Stay poor.
#13
I was thinking visualizations are always better. So can you plot the original series after differencing. There is a function in R, auto.arima, that tries to find the best model. I am sure purist may think it functions like a stepwise regression were enough content knowledge is not being incorporated, but it is a good start some times.
 

noetsi

Fortran must die
#14
SCAN and ESACF are the older SAS equivalents of auto.arima. My problem is that I am unable to find any ARIMA number that works. I can get rid of non-stationarity. I can't get rid of serial correlation as reflected by the Box Ljung results.
 

noetsi

Fortran must die
#16
George Box also said "All models are wrong, but some are useful." Even if your model is not technically perfect, does it provide useful forecasts?
It provides me, hopefully, with one more tool of the many I use to predict future results. My concern is that the model that works best, given my sense of how our data works, is non-stationary. Its also interestingly, the one that does the best on the Box Ljung test. So its the one I went with. I just get nervous predicting with a non-stationary model (although I use ESM and that is commonly non-stationary and generates for us pretty good results). :p
 

noetsi

Fortran must die
#17
Ok I found a SAS comment that completely threw me.

Say you think there is a seasonal MA of 1 (not a differencing, a seasonal MA term) that is multiplicative for monthly data. Is the estimate statement for that

Q=(12,13) or Q=(12)(13)

The note I found says the multiplicative one would be Q=(12)(13), the Q=(12,13) would be additive seasonality.

But this is the only comment I have run into that says this.