6.7 Examples of application
6.7.1 Non-seasonal data
We continue our examples with the same Box-Jenkins sales data by fitting the ETS(M,M,N) model, but this time with a holdout of ten observations:
## Time elapsed: 0.04 seconds
## Model estimated using adam() function: ETS(MMN)
## With optimal initialisation
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 245.3894
## Persistence vector g:
## alpha beta
## 0.9953 0.2430
##
## Sample size: 140
## Number of estimated parameters: 5
## Number of degrees of freedom: 135
## Information criteria:
## AIC AICc BIC BICc
## 500.7789 501.2266 515.4871 516.5934
##
## Forecast errors:
## ME: 3.215; MAE: 3.327; RMSE: 3.781
## sCE: 14.115%; Asymmetry: 91.7%; sMAE: 1.461%; sMSE: 0.028%
## MASE: 2.815; RMSSE: 2.48; rMAE: 0.924; rRMSE: 0.92
The output above is similar to the one we discussed in Section 5.6, so we can compare the two models using various criteria and select the most appropriate. Even though the default distribution for the multiplicative error models in ADAM is Gamma, we can compare this model with the ETS(A,A,N) via information criteria. For example, here are the AICc for the two models:
## [1] 501.2266
## [1] 497.03
The comparison is fair because both models were estimated via likelihood, and both likelihoods are formulated correctly, without omitting any terms (e.g. the ets()
function from the forecast
package omits the \(-\frac{T}{2} \log\left(2\pi e \frac{1}{T}\right)\) for convenience, which makes it incomparable with other models). In this example, the pure additive model is more suitable for the data than the pure multiplicative one.
Figure 6.2 shows how the model fits the data and what forecast it produces. Note that the function produces the point forecast in this case, which is not equivalent to the conditional expectation! The point forecast undershoots the actual values in the holdout.
If we want to produce the forecasts (conditional expectation and prediction interval) from the model, we can do it, using the same command as in Section 5.6:
Note that, when we ask for “prediction” interval, the forecast()
function will automatically decide what to use based on the estimated model: in the case of a pure additive one, it will use analytical solutions, while in the other cases, it will use simulations (see Section 18.3). The point forecast obtained from the forecast function corresponds to the conditional expectation and is calculated based on the simulations. This also means that it will differ slightly from one run of the function to another (reflecting the uncertainty in the error term). Still, the difference, in general, should be negligible for a large number of simulation paths.
The forecast with prediction interval is shown in Figure 6.3. The conditional expectation is not very different from the point forecast in this example. This is because the variance of the error term is close to zero, thus bringing the two close to each other:
## [1] 3.929405e-05
We can also compare the performance of ETS(M,M,N) with Gamma distribution with the conventional ETS(M,M,N) assuming normality:
## Time elapsed: 0.03 seconds
## Model estimated using adam() function: ETS(MMN)
## With optimal initialisation
## Distribution assumed in the model: Normal
## Loss function type: likelihood; Loss function value: 245.4075
## Persistence vector g:
## alpha beta
## 0.9932 0.2473
##
## Sample size: 140
## Number of estimated parameters: 5
## Number of degrees of freedom: 135
## Information criteria:
## AIC AICc BIC BICc
## 500.8149 501.2627 515.5231 516.6295
##
## Forecast errors:
## ME: 3.202; MAE: 3.315; RMSE: 3.768
## sCE: 14.06%; Asymmetry: 91.6%; sMAE: 1.456%; sMSE: 0.027%
## MASE: 2.805; RMSSE: 2.471; rMAE: 0.921; rRMSE: 0.917
In this specific example, the two distributions produce very similar results with almost indistinguishable estimates of parameters.
6.7.2 Seasonal data
The AirPassengers
data used in Section 5.6 has (as we discussed) multiplicative seasonality. So, the ETS(M,M,M) model might be more suitable than the pure additive one that we used previously:
After running the command above we might get a warning, saying that the model has a potentially explosive multiplicative trend. This happens, when the final in-sample value of the trend component is greater than one, in which case the forecast trajectory might exhibit exponential growth. Here is what we have in the output of this model:
## Time elapsed: 0.22 seconds
## Model estimated using adam() function: ETS(MMM)
## With optimal initialisation
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 481.4324
## Persistence vector g:
## alpha beta gamma
## 0.3406 0.0110 0.6444
##
## Sample size: 132
## Number of estimated parameters: 17
## Number of degrees of freedom: 115
## Information criteria:
## AIC AICc BIC BICc
## 996.8648 1002.2332 1045.8724 1058.9789
##
## Forecast errors:
## ME: -17.62; MAE: 18.722; RMSE: 23.582
## sCE: -80.551%; Asymmetry: -92.8%; sMAE: 7.132%; sMSE: 0.807%
## MASE: 0.777; RMSSE: 0.753; rMAE: 0.246; rRMSE: 0.229
Notice that the smoothing parameter \(\gamma\) is equal to zero, which implies that we deal with the data with deterministic multiplicative seasonality. Comparing the information criteria (e.g. AICc) with the ETS(A,A,A) (discussed in Subsection 5.6.2), the pure multiplicative model does a better job at fitting the data than the additive one:
adamETSAirAdditive <- adam(AirPassengers, "AAA", lags=12,
h=12, holdout=TRUE)
AICc(adamETSAirAdditive)
## [1] 1067.493
The conditional expectation and prediction interval from this model are more adequate as well (Figure 6.4):
## Warning: Your model has a potentially explosive multiplicative trend. I cannot
## do anything about it, so please just be careful.
If we want to calculate the error measures based on the conditional expectation, we can use the measures()
function from the greybox
package in the following way:
## ME MAE MSE MPE MAPE
## -17.674319871 18.741980958 555.846542851 -0.036934293 0.039250261
## sCE sMAE sMSE MASE RMSSE
## -0.807992227 0.071400083 0.008067173 0.778193187 0.752467163
## rMAE rRMSE rAME asymmetry sPIS
## 0.246605013 0.228949227 0.248351099 -0.929237541 4.548241031
These can be compared with the measures from the ETS(A,A,A) model:
## ME MAE MSE MPE MAPE
## 1.031227e+00 1.223279e+01 2.605104e+02 -9.534624e-04 2.620736e-02
## sCE sMAE sMSE MASE RMSSE
## 4.714319e-02 4.660243e-02 3.780868e-03 5.079223e-01 5.151369e-01
## rMAE rRMSE rAME asymmetry sPIS
## 1.609577e-01 1.567380e-01 1.449032e-02 2.541709e-01 3.615515e-02
Comparing, for example, MSE from the two models, we can conclude that the pure additive one is more accurate than the pure multiplicative one, which could have happened purely by chance (we should do rolling origin to confirm this).
We can also produce the plot of the time series decomposition according to ETS(M,M,M) (see Figure 6.5):
The plot in Figure 6.5 shows that the residuals are more random for the pure multiplicative model than for the ETS(A,A,A), but there still might be some structure left. The autocorrelation and partial autocorrelation functions (discussed in Section 8.3) might help in understanding this better:
The plot in Figure 6.6 shows that there is still some correlation left in the residuals, which could be either due to pure randomness or imperfect estimation of the model. Tuning the parameters of the optimiser or selecting a different model might solve the problem.