Chapter 15 Model selection and combinations in ADAM
So far, we have managed to avoid discussing the topic of model selection and combinations. However, it is important to understand how the most appropriate model can be selected and how to capture the uncertainty around the model form (this comes to one of the fundamental sources of uncertainty discussed by Chatfield, 1996). There are several ways to decide which model to use, and there are several dimensions in which a decision needs to be made:
- Which of the base models to use: ETS / ARIMA / ETS+ARIMA / Regression / ETSX / ARIMAX / ETSX+ARIMA?
- What components of the ETS model to select?
- What order of ARIMA model to select?
- Which of the explanatory variables to use?
- What distribution to use?
- Should we select the best model or combine forecasts from different ones?
- Do we need all models in the pool?
- What about the demand occurrence part of the model? (luckily, this question has already been answered in Subsection 13.1.6).
In this chapter, we discuss these questions. We start with principles based on information criteria (see discussion in Chapter 16 of Svetunkov, 2022) for ETS and ARIMA. We then move to selecting explanatory variables and finish with topics related to the combination of models.
Before we do that, we need to recall the distributional assumptions in ADAM, which play an essential role in estimation and selection if the maximum likelihood is used (Section 11.1). In that case, an information criterion (IC) can be calculated and used for the selection of the most appropriate model across the eight dimensions mentioned above. Typically, this is done by fitting all the candidate models and then selecting the one that has the lowest IC. For example, when a best fitting distribution needs to be selected, we could fit ADAMs with all the supported distributions and then select the one that gives the lowest AIC. Here is the list of the supported distributions in ADAM:
- Generalised Normal;
- Inverse Gaussian;
auto.adam() implements this automatic selection of distribution based on IC for the provided vector of
distribution by a user. This selection procedure can be combined with other selection techniques for different elements of the ADAM discussed in the following sections of this chapter. Here is an example of selection of distribution for a specific model, ETS(M,M,N) on Box-Jenkins data using
auto.adam(BJsales, model="MMN", h=10, holdout=TRUE)
## Time elapsed: 0.24 seconds ## Model estimated using auto.adam() function: ETS(MMN) ## Distribution assumed in the model: Log-Normal ## Loss function type: likelihood; Loss function value: 245.3716 ## Persistence vector g: ## alpha beta ## 1.0000 0.2406 ## ## Sample size: 140 ## Number of estimated parameters: 5 ## Number of degrees of freedom: 135 ## Information criteria: ## AIC AICc BIC BICc ## 500.7432 501.1909 515.4514 516.5577 ## ## Forecast errors: ## ME: 3.219; MAE: 3.332; RMSE: 3.786 ## sCE: 14.133%; Asymmetry: 91.6%; sMAE: 1.463%; sMSE: 0.028% ## MASE: 2.819; RMSSE: 2.484; rMAE: 0.925; rRMSE: 0.922
In this case, the function has applied one and the same model but with different distributions, estimated each one of them using likelihood and selected the one that has the lowest AICc value. It looks like Log-Normal is the most appropriate distribution for ETS(M,M,N) on this data.