I ran into this in the context of ARIMA. I thought you could always use AIC, whether a model is nested or not - that is the advantage of AIC.

"AIC is used to evaluate ARIMA commonly. AIC criteria are only used to compare nested models. This means that the smaller model (e.g., 0, 1, 0)must be a subset of the larger model (1, 1, 0). The tests are not legitimately used to compare, for example, (1, 0, 1) and (0, 1, 0). And they could not be used to compare models in which a different value of d is generated by coding the IV differently."

I do not have a link (I gathered this many years ago).


Ambassador to the humans
But if you're using different values for d then you're fundamentally changing what you're modeling and comparing doesn't make sense.
So in comparing a (1,0,0) model to a (0,0,1) model (which is not nested as far as I understand that) you could or could not use AIC? It seems like there are conflicting answers on that (but I am not sure). :)
Here is an actual federal study. How is a 1,0,0 nested inside a 0,0,1 model. Yet they use AIC to compare these:

A series of statistical tests showed that six models, including ARIMA (0, 0, 1), ARIMA (1, 0, 0), ARIMA (1, 0, 1), ARIMA (2, 0, 1), ARIMA (3, 0, 1) and ARIMA (5, 0, 1), were candidate models. The model that gave the minimum Akaike information criterion and Schwartz Bayesian criterion and followed the assumptions of residual independence was selected as the adequate model.


Ambassador to the humans
Where does it say they are nested? They say exactly what they're doing and all the models use the same value for d. They don't say they're running an F test - just that they choose the model with the best AIC.
They are using an AIC to chose the best model. And per my original post you can only use AIC to compare models if the models being compared are nested. Which they are not in this case. Unless a 1,0,0 is nested in a 0,0,1. It does not have anything in this case to do with the differencing - not sure why the differencing is pertinent here at all. Its whether they are nested or not that is.


Ambassador to the humans
As I've said in the chatroom a few times - AIC doesn't require nested models to just use it as a comparison value. The differencing matters because it fundamentally changes what you're modeling - not just what parameters you're using and how many.

A quote from Rob Hyndman (link posted below):
The AIC does not require nested models. One of the neat things about the AIC is that you can compare very different models. However, make sure the likelihoods are computed on the same data.
I decided to reuse this for a different ARIMA question.

I have a model which once I ran a seasonal differencing (but not before) it had no serial correlation. Moreover the model (0,0,0)(0,1,0)12
had a better AIC than one with the AR and MA term suggested by the software (1,0,1) (0,1,0)12. Neither the AR nor the MA term was statistically significant in this second model.

But that seems really strange. A model with serial correlation where there was none once it was differenced and which had no useful AR or MA terms. Can real data be like that? I never saw an article or book on ARIMA that suggested this.