time series analysis

#1
There is an aggregated measure represented by a variable A, modeled as a time series from a process. There was a need forecast A and also to find out the historical amount of data of A that is the best reflector of future values of A (as there was a data storage capacity issue). Using a combination of sliding window regression technique and ARIMA, it is found that the size of the sliding window out of different window sizes tried, is 100 (as it gave lesser MAPE than the rest of the ones). So the past 100 values of A is a better reflector of future. This forecast was successful.

"A" aggregation comes from B and C such that A=B+C and there is a need to predict these variables as a percentage of A. This is done to check the increase in A forecast is due to increase in B or increase in C. B and C are modeled as a time series.

Questions:
1. Can the same window size (100 as determined in the previous step) be used to predict B and C as a percentage of A using ARIMA?
2. Since B and C are expressed as a percentage of A, is it ok to predict B and then calculate C as (100-B) as time series B and C will be mirrored, meaning the same ARIMA parameters will hold good?

Any thoughts and recommendations?
 
#2
It is not clear how you determined the window size, among other things. Also, what's optimal for the aggregate measure does not have to be optimal for the constituents. The dynamics of an aggregate measure is typically better behaved than that of its constituents.
 
Last edited:
#3
It is not clear how you determined the window size, among other things. Also, what's optimal for the aggregate measure does not have to be optimal for the constituents. The dynamics of an aggregate measure is typically better behaved than that of its constituents.
Hello, Thanks for your response. I calculated the window size based on error metrics (I used MAPE metric calculated over the training data set and compared it across different window sizes).

I agree that the dynamics of the aggregated measure typically behave better than its constituents. But I couldn't logically think a way to explain if B and C have different window sizes and A has a completely different window size. That's the reason I converted B and C as a percentage of A and used the same window size. Is my claim completely flawed?
 
#4
Again, "it is not clear how you determined the window size". Also, you were saying: "I couldn't logically think a way...". Nobody is asking for an abstract theoretical argument here. The optimal model for any stochastic process (A, B or C) must be dictated by data.
 
#5
Again, "it is not clear how you determined the window size". Also, you were saying: "I couldn't logically think a way...". Nobody is asking for an abstract theoretical argument here. The optimal model for any stochastic process (A, B or C) must be dictated by data.
Thank you for your comment.

The logic to calculate the optimum window size. First, we divide the data into training and testing. We try different window sizes 10, 20, 30 etc. E.g.

Window size 10: t1,....,t10 for foreacst 1, t2,....,t11 for forecast 2 etc.
window size 20: t1,...., t20 for forecast 1, t2,....t21 for foreacst 2 etc.

We do this over the testing data set and calculate the MAPE. Then we could compare the MAPE values to see which window gives less error.

- This is how it is implemented for A to find the window size that can predict the future values of it

So, now the question is can the same window size be used to forecast B (when B is converted as a percentage of A). ? We could compute a new window size for B, but should that be done only when B "raw" data is used (not converting it as a percentage of A)? Am I correct in this thought process?
 
#6
No reason at all for the optimal window size to be the same. Example:

A_i(t) = M_i(t) + E_i(t), i = 1,2

where

M_i(t) is reset to a random value every S_i days,
E_i(t) is white noise, independent of everything else.

Now, imagine that

S_1 = 10,
S_2 = 20.

Then the optimal window size for estimating and forecasting A_1(t) is 10 (equal weights for observations), while the optimal window size for estimating and forecasting A_1(t) + A_2(t) is 20 (varying weights for observations).
 
Last edited:
#7
No reason at all for the optimal window size to be the same. Example:

A_i(t) = M_i(t) + E_i(t), i = 1,2

where

M_i(t) is reset to a random value every S_i days,
E_i(t) is white noise, independent of everything else.

Now, imagine that

S_1 = 10,
S_2 = 20.

Then the optimal window size for estimating and forecasting A_1(t) is 10 (equal weights for observations), while the optimal window size for estimating and forecasting A_1(t) + A_2(t) is 20 (varying weights for observations).
Thank you for the fast response. Super clear explanation. Now I understand that the behavior of the aggregated series A will be different from its constituents (B and C) even though A=B+C.