This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

## 6.6 Examples of application

### 6.6.1 Non-seasonal data

We continue our examples with the same Box-Jenkins sales data by fitting the ETS(M,M,N) model, but this time with a holdout of 10 observations:

adamModel <- adam(BJsales, "MMN", h=10, holdout=TRUE)
adamModel
## Time elapsed: 0.03 seconds
## Model estimated using adam() function: ETS(MMN)
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 245.3759
## Persistence vector g:
##  alpha   beta
## 1.0000 0.2412
##
## Sample size: 140
## Number of estimated parameters: 5
## Number of degrees of freedom: 135
## Information criteria:
##      AIC     AICc      BIC     BICc
## 500.7518 501.1996 515.4600 516.5664
##
## Forecast errors:
## ME: 3.217; MAE: 3.33; RMSE: 3.784
## sCE: 14.124%; Asymmetry: 91.6%; sMAE: 1.462%; sMSE: 0.028%
## MASE: 2.817; RMSSE: 2.482; rMAE: 0.925; rRMSE: 0.921

The output above is similar to the one we discussed in Section 5.6, so we can compare the two models using a variety of criteria and select the most appropriate. Even though the default distribution for the multiplicative error models in ADAM is $$\Gamma$$, we can compare this model with the ETS(A,A,N) via information criteria. For example, here are the AICc for the two models:

# ETS(M,M,N)
AICc(adamModel)
##  501.1996
# ETS(A,A,N)
AICc(adam(BJsales, "AAN", h=10, holdout=TRUE))
##  497.2624

The comparison is fair, because both models were estimated via likelihood and both likelihoods are formulated correctly, without omitting any terms (e.g. ets() from forecast package omits the $$-\frac{T}{2} \log\left(2\pi e \frac{1}{T}\right)$$ for convenience, which makes it incomparable with other models). In this example, it seems that the pure additive model is more suitable for the data than the pure multiplicative one.

Figure 6.1 shows how the model fits the data and what forecast it produces. Note that the function produces the point forecast in this case, which is not equivalent to the conditional expectation! Evidently, the point forecast undershoots the actual values in the holdout. Figure 6.1: Model fit for Box-Jenkins Sales data from ETS(M,M,N).

If we want to produce the forecasts (conditional expectation and prediction interval) from the model, we can do it, using the same command as in Section 5.6: Figure 6.2: Forecast for Box-Jenkins Sales data from ETS(M,M,N).

Note that, when we ask for “prediction” interval, the forecast() function will automatically decide what to use based on the estimated model: in case of pure additive one it will use analytical solutions, while in the other cases, it will use simulations (see Section 17.2). The point forecast obtained from forecast function corresponds to the conditional expectation and is calculated based on the simulations. This also means that it will differ slightly from one run of the function to another (reflecting the uncertainty in the error term), but the difference in general should be negligible for a large number of simulation paths.

The forecast with prediction interval are shown in Figure 6.2. Evidently, the conditional expectation is not very different from the point forecast in this example. This is because the variance of the error term is close to zero, thus brining the two close to each other:

sigma(adamModel)^2
##  3.928668e-05

We can also compare the performance of ETS(M,M,N) with $$\Gamma$$ distribution and the conventional ETS(M,M,N), assuming normality:

adamModelNormal <- adam(BJsales, "MMN", h=10, holdout=TRUE,
distribution="dnorm")
adamModelNormal
## Time elapsed: 0.03 seconds
## Model estimated using adam() function: ETS(MMN)
## Distribution assumed in the model: Normal
## Loss function type: likelihood; Loss function value: 245.3872
## Persistence vector g:
## alpha  beta
## 1.000 0.241
##
## Sample size: 140
## Number of estimated parameters: 5
## Number of degrees of freedom: 135
## Information criteria:
##      AIC     AICc      BIC     BICc
## 500.7745 501.2222 515.4827 516.5890
##
## Forecast errors:
## ME: 3.217; MAE: 3.33; RMSE: 3.785
## sCE: 14.126%; Asymmetry: 91.6%; sMAE: 1.462%; sMSE: 0.028%
## MASE: 2.817; RMSSE: 2.483; rMAE: 0.925; rRMSE: 0.921

In this specific example the two distributions produce very similar results with almost indistinguishable estimates of parameters.

### 6.6.2 Seasonal data

The AirPassengers data used in Section 5.6 has (as we discussed) multiplicative seasonality. So, the ETS(M,M,M) model might be more suitable than the pure additive one that we used previously:

adamModel <- adam(AirPassengers, "MMM", h=12, holdout=TRUE)
adamModel
## Time elapsed: 0.14 seconds
## Model estimated using adam() function: ETS(MMM)
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 468.5176
## Persistence vector g:
##  alpha   beta  gamma
## 0.7684 0.0206 0.0000
##
## Sample size: 132
## Number of estimated parameters: 17
## Number of degrees of freedom: 115
## Information criteria:
##       AIC      AICc       BIC      BICc
##  971.0351  976.4036 1020.0428 1033.1492
##
## Forecast errors:
## ME: -5.617; MAE: 15.496; RMSE: 21.938
## sCE: -25.677%; Asymmetry: -23.1%; sMAE: 5.903%; sMSE: 0.698%
## MASE: 0.643; RMSSE: 0.7; rMAE: 0.204; rRMSE: 0.213

Notice that the smoothing parameter $$\gamma=0$$ in this case, which implies that we deal with the data with deterministic multiplicative seasonality. Comparing the information criteria (e.g. AICc) with the ETS(A,A,A) (discussed in Section 5.6.2), the pure multiplicative model does a better job at fitting the data than the additive one:

adamModelAdditive <- adam(AirPassengers, "AAA", lags=12, h=12, holdout=TRUE)
AICc(adamModelAdditive)
##  1130.756

The conditional expectation and prediction interval from this model are mroe adequate as well (Figure 6.3): Figure 6.3: Forecast for air passengers data using ETS(M,M,M) model.

If we want to calculate the error measures based on the conditional expectation, we can use the measures() function from greybox package the following way:

measures(adamModel$holdout,adamForecast$mean,actuals(adamModel))
##            ME           MAE           MSE           MPE          MAPE
##  -6.217385458  15.564107535 486.731689853  -0.017604638   0.034035661
##           sCE          sMAE          sMSE          MASE         RMSSE
##  -0.284231538   0.059293549   0.007064088   0.646243451   0.704133342
##          rMAE         rRMSE          rAME     asymmetry          sPIS
##   0.204790889   0.214242950   0.087363730  -0.265975219   2.275046589

These can be compared with the measures from the ETS(A,A,A) model:

measures(adamModel$holdout,adamModelAdditive$forecast,actuals(adamModel))
##            ME           MAE           MSE           MPE          MAPE
##   28.36910729   37.11462699 2442.64739763    0.04881281    0.07053896
##           sCE          sMAE          sMSE          MASE         RMSSE
##    1.29691091    0.14139314    0.03545090    1.54105107    1.57739510
##          rMAE         rRMSE          rAME     asymmetry          sPIS
##    0.48835036    0.47994571    0.39862914    0.69383499   -7.31044007

Comparing, for example, MSE from the two models, we can conclude that the pure multiplicative model is more accurate than the pure additive one.

We can also produce the plot of the time series decomposition according to ETS(M,M,M) (see Figure 6.4). Figure 6.4: Decomposition of air passengers data using ETS(M,M,M) model.

The plot in Figure 6.4 shows that the residuals are more random for the pure multiplicative model than for the ETS(A,A,A), but there still might be some structure left. The autocorrelation and partial autocorrelation functions (discussed in Section 8.3) might help in understanding this better:

par(mfcol=c(2,1), mar=c(2,4,2,1))
plot(adamModel,10:11) Figure 6.5: ACF and PACF of residuals of ETS(M,M,M) model.

The plot in Figure 6.4 shows that there is still some correlation left in the residuals, which could be either due to pure randomness or due to the imperfect estimation of the model. Tuning the parameters of the optimiser or selecting a different model might solve the problem.