# ARIMA models

#### noetsi

##### Fortran must die
I have two models. One applies a seasonal difference, one applies a seasonal and non seasonal difference. The latter is stationary according to the ADF test, the former is not. But the former passes the Box Ljung test, meaning there is no serial correlation, and later one (the one that is stationary) does not - its not close. Also the one that is stationary generates numbers that are completely impossible for our organization (they are totally impossible given past spending, absurdly so).

I have been unable to find any model that is stationary according to the ADF that generates realistic numbers or passes the Box Ljung test for no serial correlation.

#### staassis

##### Active Member
Could you try [Model 3] = [Model 1] + [deterministic polynomial trend]? If this was the true dynamics and [Model 1] was non-stationary, [Model 3] would still be estimated consistently.

Regarding [Model 2]: when there is too much differencing, forecasts of the original process are oftentimes weird. But you probably know this already.

How much data are you having this time? What do AIC & BIC say?

#### noetsi

##### Fortran must die
I did not know that differencing caused problems for prediction, although that makes sense.

I am not sure what the deterministic trend would be. Just a square? A cubic?

#### staassis

##### Active Member
Depends on how much data you've got. A cubic trend might work. In general, the order of the polynomial could be determined by AIC or BIC.

#### staassis

##### Active Member
This is too little for double differencing or any big model. A rule of thumb says: at least 10-15 observations per each parameter to estimate, depending on the noise conditions and the framework. So all those estimates + Box Ljung tests + ADF tests are not terribly accurate.

I guess, you could try single differencing + AR(1) + quadratic deterministic trend.......

#### noetsi

##### Fortran must die
I found this comment fascinating since I spent a lot of time learning the classical way to identify PDF components. For one thing I never realized these were only theoretical (I knew they broke down with both MA and AR components).

"This hazard is revealed by sampling experiments. When the data come from the real world, the notion that there is an underlying ARMA processis a fiction, and the business of model identification becomes more doubtful. Then there may be no such thing as the correct model; and the choice amongstalternative models must be made partly with a view their intended uses."

https://www.le.ac.uk/users/dsgp1/COURSES/THIRDMET/MYLECTURES/4XIDNTIFY.pdf

#### staassis

##### Active Member
Like with everything, the model is just a model. Whatever parametric paradigm one may choose, the truth may not belong there.

The issue is known as model bias. Like with any type of bias, we may even want to introduce it intentionally if the resulting estimation procedure has a much smaller variance and the mean-square error (MSE) decreases as the result. Say, we know that the truth is a spline with a very high degree of wiggliness. "Who cares?" - we say to ourselves: "With only 100 data points at hand, a cubic spline is the best we can do." And we are right. A cubic spline will deliver a lower MSE than a spline of order 10.

Last edited:

#### noetsi

##### Fortran must die
Its frustrating to spend a lot of time learning this to find out it commonly does not work with real world data....