\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 15 Model selection and combinations in ADAM

So far, we have managed to avoid discussing the topic of model selection and combinations. However, it is important to understand how the most appropriate model can be selected and how to capture the uncertainty around the model form (this comes to one of the fundamental sources of uncertainty discussed by Chatfield, 1996). There are several ways to decide which model to use, and there are several dimensions in which a decision needs to be made:

Which of the base models to use: ETS/ARIMA/ETS+ARIMA/Regression/ETSX/ARIMAX/ETSX+ARIMA?
What components of the ETS model to select?
What order of ARIMA model to select?
Which of the explanatory variables to use?
What distribution to use?
Should we select the best model or combine forecasts from different ones?
Do we need all models in the pool?
What about the demand occurrence part of the model? (Luckily, this question has already been answered in Subsection 13.1.6.)

In this chapter, we discuss these questions. We start with principles based on information criteria (see discussion in Chapter 16 of Svetunkov and Yusupova, 2025) for ETS and ARIMA. We then move to selecting explanatory variables and finish with topics related to the combination of models.

Before we do that, we need to recall the distributional assumptions in ADAM, which play an essential role in estimation and selection if the maximum likelihood is used (Section 11.1). In that case, an information criterion can be calculated and used for the selection of the most appropriate model across the eight dimensions mentioned above. Typically, this is done by fitting all the candidate models and then selecting the one that has the lowest information criterion. For example, when a best-fitting distribution needs to be selected, we could fit ADAMs with all the supported distributions and then select the one that gives the lowest AIC. Here is the list of the supported distributions in ADAM:

Normal;
Laplace;
S;
Generalised Normal;
Log-Normal;
Inverse Gaussian;
Gamma.

The function auto.adam() implements this automatic selection of distribution based on an information criterion for the provided vector of distribution by a user. This selection procedure can be combined with other selection techniques for different elements of the ADAM discussed in the following sections of this chapter. Here is an example of selection of distribution for a specific model, ETS(M,M,N) on Box-Jenkins data using auto.adam():

auto.adam(BJsales, model="MMN", h=10, holdout=TRUE)

## Time elapsed: 1.74 seconds
## Model estimated using auto.adam() function: ETS(MMN)
## With backcasting initialisation
## Distribution assumed in the model: Log-Normal
## Loss function type: likelihood; Loss function value: 245.3729
## Persistence vector g:
##  alpha   beta 
## 1.0000 0.2395 
## 
## Sample size: 140
## Number of estimated parameters: 3
## Number of degrees of freedom: 137
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 496.7458 496.9223 505.5707 506.0067 
## 
## Forecast errors:
## ME: 3.222; MAE: 3.335; RMSE: 3.79
## sCE: 14.148%; Asymmetry: 91.7%; sMAE: 1.464%; sMSE: 0.028%
## MASE: 2.821; RMSSE: 2.486; rMAE: 0.926; rRMSE: 0.923

In this case, the function has applied one and the same model but with different distributions, estimated each one of them using likelihood, and selected the one that has the lowest AICc value. It looks like Gamma is the most appropriate distribution for ETS(M,M,N) on this data.

References

• Chatfield, C., 1996. Model Uncertainty and Forecast Accuracy. Journal of Forecasting. 15, 495–508. https://doi.org/10.1002/(SICI)1099-131X(199612)15:7<495::AID-FOR640>3.3.CO;2-F

• Svetunkov, I., Yusupova, A., 2025. Statistics for business analytics. https://openforecast.org/sba/ version: 01.06.2025