15.1 ETS components selection
Remark. The model selection mechanism explained in this section is also used in the es()
function from the smooth
package. So, all the options for ADAM discussed here can be used in the case of es()
as well.
Having 30 ETS models to choose from, selecting the most appropriate one becomes challenging. Petropoulos et al. (2018b) showed that human experts can do this task successfully if they need to decide which components to include in the time series. However, when you face the problem of fitting ETS to thousands of time series, the judgmental selection becomes infeasible. Using some sort of automation becomes critically important.
The components selection in ETS is based on information criteria (Section 16.4 of Svetunkov, 2022). The general procedure consists of the following three main steps (in the ETS context, this approach was first proposed by Hyndman et al., 2002):
- Define a pool of models;
- Fit all models in the pool;
- Select the one that has the lowest information criterion.
Depending on what is included in step (1), we will get different results. So, the pool needs to be selected carefully based on our understanding of the problem. The adam()
function in the smooth
package supports the following options:
- Pool of all 30 models (Section 4.1),
model="FFF"
; model="ZZZ"
, which triggers the selection among all possible models based on a branch-and-bound algorithm (see below);- Pool of pure additive models (Section 5.1),
model="XXX"
. As an option, “X” can also be used to tell function to only try additive components on the selected place. e.g.model="MXM"
will tell function to only test ETS(M,N,M), ETS(M,A,M), and ETS(M,Ad,M) models. Branch-and-bound is used in this case as well; - Pool of pure multiplicative models (Section 6.1),
model="YYY"
. Similarly to (3), we can telladam()
to only consider multiplicative components in a specific place. e.g.model="YNY"
will consider only ETS(M,N,N) and ETS(M,N,M). Similarly to (2) and (3), in this caseadam()
will use a branch-and-bound algorithm in the components selection; - Pool of pure models only,
model="PPP"
– this is a shortcut for doing (2) and (3) and then selecting the best between the two pools; - Manual pool of models, which can be provided as a vector of models, for example:
model=c("ANN","MNN","ANA","AAN")
.
There is a trade-off when deciding which pool to use: if you provide the large one, it will take more time to find the appropriate model, and there is a risk of overfitting the data; if you provide the small pool, then the optimal model might be outside of it, giving you the sub-optimal one.
Furthermore, in some situations, you might not need to go through all models in the pool because, for example, the seasonal component is not required for the data. Trying out all the models would be just a waste of time. So, to address this issue, I have developed a branch-and-bound algorithm for the selection of the most appropriate ETS model, which is triggered via model="ZZZ"
. The idea of the algorithm is to drop the components that do not improve the model. It allows forming a much smaller pool of models after identifying what components improve the fit. Here is how it works:
- Apply ETS(A,N,N) to the data, calculate an information criterion;
- Apply ETS(A,N,A) to the data, calculate information criterion. If it is lower than (1), then this means that there is some seasonal component in the data, move to step (3). Otherwise, go to (4);
- Apply ETS(M,N,M) model and calculate information criterion. If it is lower than the previous one, then the data exhibits multiplicative seasonality. Go to (4);
- Fit the model with the additive trend component and the seasonal component selected from the previous steps, which can be either “N”, “A”, or “M”. Calculate information criterion for the new model and compare it with the best information criterion so far. If it is lower than any of criteria before, there is some trend component in the data. If it is not, then the trend component is not needed.
Remark. In case of multiple seasonal ETS (e.g. day of week and day of year seasonality), all the seasonal components should have the same type (see discussion in Section 12.1), so there is no need to test mixed ones (e.g. one seasonal component being additive, while the other one is multiplicative). Also, while it is possible to test whether each of the seasonal components is needed (e.g. day of week is needed, while day of year is unnecessary), this is not yet implemented in adam()
. The simple solution here would be fitting models with and without some of the seasonal components and then selecting the one that has the lowest information criterion.
Based on these four steps, we can kick off the unnecessary components and reduce the pool of models to check. For example, if the algorithm shows that seasonality is not needed, but there is a trend, then we only have ten models to check overall instead of 30: ETS(A,N,N), ETS(A,A,N), ETS(A,Ad,N), ETS(M,N,N), ETS(M,M,N), ETS(M,Md,N), ETS(A,M,N), ETS(A,Md,N), ETS(M,A,N), and ETS(M,Ad,N). In steps (2) and (3), if there is a trend in the seasonal data, the model will have a higher than needed smoothing parameter \(\alpha\). While it will not approximate the data perfectly, the seasonality will play a more important role than the trend in reducing the value of the information criterion, which will help in correctly selecting the component. This is why the algorithm is, in general, efficient. It might not guarantee that the optimal model is selected all the time, but it substantially reduces the computational time.
The branch-and-bound algorithm can be combined with different types of model pools and is also supported in model="XXX"
and model="YYY"
, where the pool of models for steps (1) – (4) is restricted by the pure ones only. This would also work in the combinations of the style model="XYZ"
, where the function would form the pool of the following models: ETS(A,N,N), ETS(A,M,N), ETS(A,Md,N), ETS(A,N,A), ETS(A,M,A), ETS(A,Md,A), ETS(A,N,M), ETS(A,M,M), and ETS(A,Md,M).
Finally, while the branch-and-bound algorithm is efficient, it might end up providing a mixed model, which might not be suitable for your data. So, it is recommended to think of the possible pool of models before applying it to the data. For example, in some cases, you might realise that additive seasonality is unnecessary and that the data can be either non-seasonal or with multiplicative seasonality. In this case, you can explore the model="YZY"
option, aligning the error term with the seasonal component.
Here is an example with an automatically selected ETS model using the branch-and-bound algorithm described above:
## Time elapsed: 1.03 seconds
## Model estimated using adam() function: ETS(MAM)
## With optimal initialisation
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 466.9086
## Persistence vector g:
## alpha beta gamma
## 0.7807 0.0003 0.0002
##
## Sample size: 132
## Number of estimated parameters: 17
## Number of degrees of freedom: 115
## Information criteria:
## AIC AICc BIC BICc
## 967.8172 973.1857 1016.8249 1029.9313
##
## Forecast errors:
## ME: 11.859; MAE: 22.322; RMSE: 26.996
## sCE: 54.214%; Asymmetry: 68.4%; sMAE: 8.504%; sMSE: 1.058%
## MASE: 0.927; RMSSE: 0.862; rMAE: 0.294; rRMSE: 0.262
In this specific example, the optimal model will coincide with the one selected via model="FFF"
and model="ZXZ"
(the reader is encouraged to try these pools on their own), although this does not necessarily hold universally.