This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

18.1 ADAM ETS components selection

Having 30 ETS models to choose from, the task of selecting the most appropriate one becomes challenging. Petropoulos et al. (2018) show that human experts can do this task successfully if they need to choose, which components to include in time series. However, when you face the problem of fitting ETS to thousands of time series, the judgmental selection becomes infeasible. Using some sort of automatic selection becomes critically important.

The basic idea underlying the components selection in ETS is based on information criteria: we define a pool of models, we fit those models and we select the one that has the lowest information criterion. Using this approach in ETS context was first proposed by Hyndman et al. (2002). Based on this, we can construct some pool of models (e.g. based on our understanding of the problem) and then select the one that is the most appropriate to our data. adam() function in smooth package supports the following options for the pools:

  1. Pool of all 30 models, model="FFF";
  2. Pool of pure additive models, model="XXX". As an option, “X” can also be used to tell function to only try additive component on the selected place. e.g. model="MXM" will tell function to only test ETS(M,N,M), ETS(M,A,M) and ETS(M,Ad,M) models;
  3. Pool of pure multiplicative models, model="YYY". Similarly to (2), we can tell adam() to only consider multiplicative component in a specific place. e.g. model="YNY" will consider only ETS(M,N,N) and ETS(M,N,M);
  4. Pool of pure models only, model="PPP" - this is a shortcut for doing (2) and (3) and then selecting the best between the two pools;
  5. Manual pool of models, which can be provided as a vector of models, for example: model=c("ANN","MNN","ANA","AAN");
  6. model="ZZZ", which triggers the selection among all possible models based on branch-and-bound algorithm (see below).

In the cases explained above, adam() will try different models and select the most appropriate one from the predefined pool. There is a trade-off, when deciding which pool to use: if you provide the bigger one, it will take more time to find the appropriate one and there is a risk of overfitting the data; if you provide the smaller pool, then the optimal model might be outside of the pool, giving you the sub-optimal one.

Furthermore, in some situations you might not need to go through all 30 models, because, for example, the seasonal component is not needed in the data. Trying out all the models would be just a waste of time. So, in order to address this issue, I have developed a branch-and-bound algorithm for the selection of the most appropriate ETS model, which is triggered via model="ZZZ" (the same mechanism is used in es() function). The idea of the algorithm is to drop the components that do not improve the model. Here how it works:

  1. Apply ETS(A,N,N) to the data, calculate an information criterion (IC);
  2. Apply ETS(A,N,A) to the data, calculate IC. If it is lower than (1), then this means that there is some sort of seasonal component in the data, move to step (3). Otherwise go to (4);
  3. If (2) is lower than (1), then apply ETS(M,N,M) model and calculate IC. If it is lower than it means that the data exhibits multiplicative seasonality. Go to (4);
  4. Fit the model with the additive trend component and the seasonal component selected from previous steps, which can be either “N”, “A” or “M”, depending on the IC value. Calculate IC for the new model and compare it with the best IC so far. If it is lower, then there is some trend component in the data. If it is not, then the trend component is not needed.

Based on these 4 steps, we can kick off the unneeded components and reduce the pool of models to test. For example, if the algorithm shows that seasonality is not needed, but there is a trend, then we only have 10 models to test overall instead of 30: ETS(A,N,N), ETS(A,A,N), ETS(A,Ad,N), ETS(M,N,N), ETS(M,M,N), ETS(M,Md,N), ETS(A,M,N), ETS(A,Md,N), ETS(M,A,N), ETS(M,Ad,N). Also, in steps (2) and (3), if there is a trend in the data, then the model will have higher than needed smoothing parameter \(\alpha\), but the seasonality will play an important role in reducing the value of IC. This is why the algorithm is in general efficient. It might not guarantee that the optimal model will be selected all the time, but it reduces the computational time.

The branch-and-bound algorithm can be combined with different types of models and is in fact also supported in model="XXX" and model="YYY, where the pool of models for steps (1) - (4) is restricted by the pure models only.

Finally, while the branch-and-bound algorithm is quite efficient, it might end up providing a mixed model, which might not be very suitable for the data. So, it is recommended to think of the possible pool of models prior to applying it to the data. For example, in some cases you might realise that additive seasonality is not needed, and that the data can be either non-seasonal or with multiplicative seasonality. In this case, you can explore the model="YZY" option, aligning the error term with the seasonal component.

Here is an example with automatically selected ETS model using the branch-and-bound algorithm described above:

## Forming the pool of models based on... ANN , ANA , MNM , MAM , Estimation progress:    45 %55 %64 %73 %82 %91 %100 %... Done!

## Time elapsed: 1.7 seconds
## Model estimated using adam() function: ETS(MAM)
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 466.9124
## Persistence vector g:
##  alpha   beta  gamma 
## 0.7694 0.0058 0.0001 
## 
## Sample size: 132
## Number of estimated parameters: 17
## Number of degrees of freedom: 115
## Information criteria:
##       AIC      AICc       BIC      BICc 
##  967.8249  973.1933 1016.8325 1029.9390 
## 
## Forecast errors:
## ME: 7.913; MAE: 19.331; RMSE: 25.012
## sCE: 36.175%; Asymmetry: 61.7%; sMAE: 7.364%; sMSE: 0.908%
## MASE: 0.803; RMSSE: 0.798; rMAE: 0.254; rRMSE: 0.243

In this specific example, the optimal model will coincide with the one selected via model="FFF" and model="ZXZ", although this does not necessarily is the case universally.

References

• Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting. 18, 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8

• Petropoulos, F., Kourentzes, N., Nikolopoulos, K., Siemsen, E., 2018. Judgmental selection of forecasting models. Journal of Operations Management. 60, 34–46. https://doi.org/10.1016/j.jom.2018.05.005