This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

15.4 Forecasts combinations in ADAM

When it comes to achieving as accurate as possible forecasts in practice, the best and the most robust (in terms of not failing) approach is producing combined forecasts. The main motivation for combining comes from the idea that there is no one best forecasting method for everything - methods might perform very well in some conditions and fail in the others, and in practice it is typically not possible to say, which of the cases you face. Furthermore, the model selected on one sample might differ from the model selected for the same sample but with one more observation. Thus there is a model uncertainty (as defined by Chatfield, 1996) and the safer option is to produce forecasts from several models and then combine them to get the final forecast. This way, the potential damage from an inaccurate forecast hopefully will be reduced.

There are many different technique for combining forecasts, the non-exhaustive list includes:

  1. Simple average, which works fine as long as you do not have extremely poor methods;
  2. Median, which produces good combinations, when the pool of models is limited and might contain those that produce forecasts very different from the others (e.g. explosive forecasts). However, when a big pool of models is considered, then median might ignore important information and lead to decreases in accuracy, as noted by Jose and Winkler (2008). Stock and Watson (2004) conducted an experiment on macroeconomic data, and medians performed poorer than the other approaches (probably because of the high number of forecasting methods), while median-based combination worked well for Petropoulos and Svetunkov (2020), who considered only 4 forecasting methods;
  3. Trimmed and / or Winsorized mean, which drop extreme forecasts, when calculating the mean and as was shown by Jose and Winkler (2008) work well in case of big pools of models, outperforming medians and simple average;
  4. Weighted mean, which assigns weights to each forecast and produces a combined forecast based on them. While this approach sounds more reasonable than the others, there is no guarantee that it will work, because the weights need to be estimated and might change with the change of sample size or a pool of models. Claeskens et al. (2016) explain why in many cases the simple average approach outperforms weighted averages: it does not require estimation of weights and thus does not introduce as much uncertainty. However, when done smartly, combinations can be beneficial in terms of accuracy, as shown for example, in Kolassa (2011) and Kourentzes et al. (2019a).

The forecast combination approach implemented in ADAM is the weighted mean, based on Kolassa (2011), who used AIC weights as proposed by Burnham and Anderson (2004). The idea of this approach is to estimate all models in the pool, calculate information criteria (IC) for each of them (see discussion in Section 13.4 in Svetunkov (2021c)) and then calculate weights for each model. Those model that have lower ICs, will have higher weights, while the poorly performing ones will have the higher ones. The only requirement of the approach is for the parameters of models to be estimated via likelihood maximisation (see Section 11.1). It is not important, what model is used or what distribution is assumed, as long as the models are initialised and constructed in the same way and the likelihood is used in the estimation.

When it comes to prediction interval, the correct way of calculating them for the combination is to consider the joint distribution of all forecasting models in the pool and take quantiles based on that. However, Lichtendahl et al. (2013) showed that a simpler approach of averaging the quantiles works well in practice. It is fast and efficient in terms of obtaining well-calibrated intervals.

In R, adam() function supports the combination of ETS models via model="CCC" or any other combination of letters, as long as the model contains “C” in its name. For example, the function will combine all non-seasonal models if model="CCN" is provided. Consider the following example on Box-Jenkins series:

adamETSCCN <- adam(BJsales, "CCN", h=10, holdout=TRUE, ic="AICc")
plot(adamETSCCN,7)
An example of combination of ETS non-seasonal models on Box-Jenkins time series.

Figure 15.1: An example of combination of ETS non-seasonal models on Box-Jenkins time series.

In the example above, the function will estimate all non-seasonal models, extract AICc for each of them and then calculate weights, which we can extract for further analysis:

round(adamETSCCN$ICw,3)
##   ANN   MAN  AAdN   MMN  AMdN   MNN   AAN  MAdN   AMN  MMdN 
## 0.000 0.014 0.252 0.010 0.511 0.000 0.073 0.031 0.050 0.059

As can be seen from the output of weights, the level models ETS(A,N,N) and ETS(M,N,N) were further away from the best model and as a result got weights very close to zero. The fitted values in Figure 15.1 are combined from all models, the residuals are equal to \(e_t = y_t - \hat{y}_t\), where \(\hat{y}_t\) is the combined value. The final forecast together with the prediction interval can be generated via the forecast() function:

plot(forecast(adamETSCCN,h=10,interval="prediction"))

What the function does in this case is produces forecasts and prediction intervals from each model and then uses original weights to combine them. In fact, each individual model can be extracted and used separately, if needed. Here is an example with ETS(A,Ad,N) model from the estimated pool:

plot(forecast(adamETSCCN$models$AAdN,h=10,interval="prediction"))

Alternatively, if we do not need to consider all ETS models, we can provide the pool of models, including a model with “C” in its name. Here is an example of how pure additive models can be combined:

adamETSCCNPureAdditive <- adam(BJsales, c("CCN","ANN","AAN","AAdN"), h=10, holdout=TRUE, ic="AICc")
plot(adamETSCCNPureAdditive,7)

The main issue with the combined ETS approach is that it is computationally expensive due to the estimation of all models in the pool and can also result in high memory usage. As a result, it is recommended to be smart in deciding, which models to include in the pool.

While adam() supports IC weights combination of ETS models only, it is also possible to combine ARIMA, regression models and models with different distributions in the framework. Given that all models are initialised in the same way and that the likelihoods are calculated using similar principles, the weights can be calculated manually using formula from Burnham and Anderson (2004): \[\begin{equation} w_i = \frac{\exp\left(-\frac{1}{2}\Delta_i\right)}{\sum_{j=1}^n \exp\left(-\frac{1}{2}\Delta_j\right)}, \tag{11.1} \end{equation}\] where \(\Delta_i=\mathrm{IC}_i - \min_{i=1}^n \left(\mathrm{IC}_i\right)\) is the information criteria distance from the best performing model, \(\mathrm{IC}_i\) is the value of information criterion and \(n\) is the number of models in the pool. For example, here how we can combine the best ETS with the best ARIMA and the ETSX(M,M,N) model in the ADAM framework, based on BICc:

# Prepare data with explanatory variables
BJsalesData <- cbind(as.data.frame(BJsales),
                     xregExpander(BJsales.lead,c(-5:5)))

# Apply models
adamModelsPool <- vector("list",3)
adamModelsPool[[1]] <- adam(BJsales, "ZZN", h=10, holdout=TRUE, ic="BICc")
adamModelsPool[[2]] <- adam(BJsales, "NNN", orders=list(ar=3,i=2,ma=3,select=TRUE),
                            h=10, holdout=TRUE, ic="BICc")
adamModelsPool[[3]] <- adam(BJsalesData, "MMN", h=10, holdout=TRUE, ic="BICc",
                            regressors="select")

# Extract BICc values
adamModelsICs <- sapply(adamModelsPool,BICc)

# Calculate weights
adamModelsICWeights <- adamModelsICs - min(adamModelsICs)
adamModelsICWeights[] <- exp(-0.5*adamModelsICWeights)/sum(exp(-0.5*adamModelsICWeights))
names(adamModelsICWeights) <- c("ETS","ARIMA","ETSX")
round(adamModelsICWeights,3)
##   ETS ARIMA  ETSX 
## 0.524 0.424 0.052

These weights can then be used for the combination of the fitted values, forecasts and prediction intervals:

adamModelsPoolForecasts <- vector("list",3)
for(i in 1:3){
    adamModelsPoolForecasts[[i]] <- forecast(adamModelsPool[[i]], h=10, interval="pred")
}
finalForecast <- cbind(sapply(adamModelsPoolForecasts,"[[","mean") %*% adamModelsICWeights,
                       sapply(adamModelsPoolForecasts,"[[","lower") %*% adamModelsICWeights,
                       sapply(adamModelsPoolForecasts,"[[","upper") %*% adamModelsICWeights)
colnames(finalForecast) <- c("Mean","Lower bound (2.5%)", "Upper bound (97.5%)")
finalForecast <- ts(finalForecast, start=start(adamModelsPoolForecasts[[i]]$mean))
finalForecast
## Time Series:
## Start = 141 
## End = 150 
## Frequency = 1 
##         Mean Lower bound (2.5%) Upper bound (97.5%)
## 141 257.6723           254.9248            260.3754
## 142 257.7322           253.3936            262.0412
## 143 257.7879           251.9851            263.6665
## 144 257.8457           250.4622            265.2451
## 145 257.9115           248.9796            266.8085
## 146 257.9836           247.4946            268.4933
## 147 258.0421           245.9177            270.1426
## 148 258.0866           244.3304            271.8584
## 149 258.1511           242.7951            273.5428
## 150 258.2060           241.1998            275.3381

In order to see how the forecast looks like, we can plot it via graphmaker() function from greybox:

graphmaker(BJsales, finalForecast[,1],
           lower=finalForecast[,2], upper=finalForecast[,3],
           level=0.95)
Final forecast from the combination of ETS, ARIMA and ETSX models.

Figure 15.2: Final forecast from the combination of ETS, ARIMA and ETSX models.

References

• Burnham, K.P., Anderson, D.R., 2004. Model Selection and Multimodel Inference. Springer New York. https://doi.org/10.1007/b97636
• Chatfield, C., 1996. Model uncertainty and forecast accuracy. Journal of Forecasting. 15, 495–508. https://doi.org/10.1002/(SICI)1099-131X(199612)15:7<495::AID-FOR640>3.3.CO;2-F
• Claeskens, G., Magnus, J.R., Vasnev, A.L., Wang, W., 2016. The forecast combination puzzle: A simple theoretical explanation. International Journal of Forecasting. 32, 754–762. https://doi.org/10.1016/j.ijforecast.2015.12.005
• Jose, V.R.R., Winkler, R.L., 2008. Simple robust averages of forecasts: Some empirical results. International Journal of Forecasting. 24, 163–169. https://doi.org/10.1016/j.ijforecast.2007.06.001
• Kolassa, S., 2011. Combining exponential smoothing forecasts using Akaike weights. International Journal of Forecasting. 27, 238–251. https://doi.org/10.1016/j.ijforecast.2010.04.006
• Kourentzes, N., Barrow, D., Petropoulos, F., 2019a. Another look at forecast selection and combination: Evidence from forecast pooling. International Journal of Production Economics. 209, 226–235. https://doi.org/10.1016/j.ijpe.2018.05.019
• Lichtendahl, K.C., Grushka-Cockayne, Y., Winkler, R.L., 2013. Is It Better to Average Probabilities or Quantiles? Management Science. 59, 1594–1611. https://doi.org/10.1287/mnsc.1120.1667
• Petropoulos, F., Svetunkov, I., 2020. A simple combination of univariate models. International Journal of Forecasting. 36, 110–115. https://doi.org/10.1016/j.ijforecast.2019.01.006
• Stock, J.H., Watson, M.W., 2004. Combination forecasts of output growth in a seven-country data set. Journal of Forecasting. 23, 405–430. https://doi.org/10.1002/for.928
• Svetunkov, I., 2021c. Statistics for business analytics. https://openforecast.org/sba/ (version: [01.09.2021])