## 15.1 ADAM ETS components selection

Remark. The model selection mechanism explained in this section is also used in the es() function from the smooth package. So, all the options for ADAM discussed here can be used in case of es() as well.

Having 30 ETS models to choose from, selecting the most appropriate one becomes challenging. Petropoulos et al. (2018b) show that human experts can do this task successfully if they need to decide which components to include in the time series. However, when you face the problem of fitting ETS to thousands of time series, the judgmental selection becomes infeasible. Using some automatic selection becomes critically important.

The basic idea underlying the components selection in ETS is based on information criteria (Section 13.3 of Svetunkov, 2022a). The general procedure consists of the following three steps:

1. we define a pool of models,
2. we fit those models,
3. and we select the one that has the lowest information criterion.

Using this approach in the ETS context was first proposed by Hyndman et al. (2002). Based on this, we can prepare a pool of models (e.g. based on our understanding of the problem) and then select the most appropriate one for our data. The adam() function in the smooth package supports the following options for the pools:

1. Pool of all 30 models (Section 4.1), model="FFF";
2. Pool of pure additive models (Section 5.1), model="XXX". As an option, “X” can also be used to tell function to only try additive component on the selected place. e.g. model="MXM" will tell function to only test ETS(M,N,M), ETS(M,A,M) and ETS(M,Ad,M) models;
3. Pool of pure multiplicative models (Section 6.1), model="YYY". Similarly to (2), we can tell adam() to only consider multiplicative component in a specific place. e.g. model="YNY" will consider only ETS(M,N,N) and ETS(M,N,M);
4. Pool of pure models only, model="PPP" – this is a shortcut for doing (2) and (3) and then selecting the best between the two pools;
5. Manual pool of models, which can be provided as a vector of models, for example: model=c("ANN","MNN","ANA","AAN");
6. model="ZZZ", which triggers the selection among all possible models based on a branch-and-bound algorithm (see below).

In the cases above, adam() will try different models and select the most appropriate one from the predefined pool. There is a trade-off when deciding which pool to use: if you provide the bigger one, it will take more time to find the appropriate model, and there is a risk of overfitting the data; if you provide the smaller pool, then the optimal model might be outside of it, giving you the sub-optimal one.

Furthermore, in some situations, you might not need to go through all 30 models because, for example, the seasonal component is not required for the data. Trying out all the models would be just a waste of time. So, to address this issue, I have developed a branch-and-bound algorithm for the selection of the most appropriate ETS model, which is triggered via model="ZZZ" (the same mechanism is used in the es() function). The idea of the algorithm is to drop the components that do not improve the model. Here is how it works:

1. Apply ETS(A,N,N) to the data, calculate an information criterion (IC);
2. Apply ETS(A,N,A) to the data, calculate IC. If it is lower than (1), then this means that there is some seasonal component in the data, move to step (3). Otherwise, go to (4);
3. Apply ETS(M,N,M) model and calculate IC. If it is lower than the previous one, then the data exhibits multiplicative seasonality. Go to (4);
4. Fit the model with the additive trend component and the seasonal component selected from previous steps, which can be either “N”, “A”, or “M”. Calculate IC for the new model and compare it with the best IC so far. If it is lower than any of criteria before, there is some trend component in the data. If it is not, then the trend component is not needed.

Based on these four steps, we can kick off the unneeded components and reduce the pool of models to check. For example, if the algorithm shows that seasonality is not needed, but there is a trend, then we only have 10 models to check overall instead of 30: ETS(A,N,N), ETS(A,A,N), ETS(A,Ad,N), ETS(M,N,N), ETS(M,M,N), ETS(M,Md,N), ETS(A,M,N), ETS(A,Md,N), ETS(M,A,N), ETS(M,Ad,N). In steps (2) and (3), if there is a trend in the data, the model will have a higher than needed smoothing parameter $$\alpha$$, but the seasonality will play an important role in reducing the value of IC. This is why the algorithm is, in general efficient. It might not guarantee that the optimal model will be selected all the time, but it substantially reduces the computational time.

The branch-and-bound algorithm can be combined with different types of model pools and is also supported in model="XXX" and model="YYY", where the pool of models for steps (1) – (4) is restricted by the pure ones only. This would also work in the combinations of the style model="XYZ", where the function would form the pool of the following models: ETS(A,N,N), ETS(A,M,N), ETS(A,Md,N), ETS(A,N,A), ETS(A,M,A), ETS(A,Md,A), ETS(A,N,M), ETS(A,M,M) and ETS(A,Md,M).

Finally, while the branch-and-bound algorithm is efficient, it might end up providing a mixed model, which might not be very suitable for the data. So, it is recommended to think of the possible pool of models before applying it to the data. For example, in some cases, you might realise that additive seasonality is unnecessary and that the data can be either non-seasonal or with multiplicative seasonality. In this case, you can explore the model="YZY" option, aligning the error term with the seasonal component.

Here is an example with automatically selected ETS model using the branch-and-bound algorithm described above:

adamETSModel <- adam(AirPassengers, model="ZZZ", h=12, holdout=TRUE)
adamETSModel
## Time elapsed: 1.06 seconds
## Model estimated using adam() function: ETS(MAM)
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 467.2981
## Persistence vector g:
##  alpha   beta  gamma
## 0.7691 0.0053 0.0000
##
## Sample size: 132
## Number of estimated parameters: 17
## Number of degrees of freedom: 115
## Information criteria:
##       AIC      AICc       BIC      BICc
##  968.5961  973.9646 1017.6038 1030.7102
##
## Forecast errors:
## ME: 9.537; MAE: 20.784; RMSE: 26.106
## sCE: 43.598%; Asymmetry: 64.8%; sMAE: 7.918%; sMSE: 0.989%
## MASE: 0.863; RMSSE: 0.833; rMAE: 0.273; rRMSE: 0.254

In this specific example, the optimal model will coincide with the one selected via model="FFF" and model="ZXZ" (the reader is encouraged to try these selection mechanisms on their own), although this does not necessarily hold universally.

### References

• Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting. 18, 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8
• Petropoulos, F., Kourentzes, N., Nikolopoulos, K., Siemsen, E., 2018b. Judgmental selection of forecasting models. Journal of Operations Management. 60, 34–46. https://doi.org/10.1016/j.jom.2018.05.005
• Svetunkov, I., 2022a. Statistics for business analytics. https://openforecast.org/sba/ (version: 31.03.2022)