17.3 Example in R

17.3.1 Example 1

To demonstrate how the scale model works, we will use the model that we constructed in Section 10.6 (we use Normal distribution for simplicity):

To see if the scale model is needed in this case, we will produce several diagnostics plots (Figure 17.1).

Diagnostic plots for the ETSX(M,N,M) model.

Figure 17.1: Diagnostic plots for the ETSX(M,N,M) model.

Figure 17.1 shows how the residuals change with: (1) change of fitted values, (2) change of time. The first plot shows that there might be a slight change in the variability of residuals, but it is not apparent. The second plot does not demonstrate any significant changes in variance over time. This means that there might be some factors impacting the variance of residuals, but it does not exhibit GARCH-style dynamics. To investigate this further, we can plot explanatory variables vs the residuals (Figure 17.2).

Spread plot for the ETSX(M,N,M) model.

Figure 17.2: Spread plot for the ETSX(M,N,M) model.

The only slightly alarming element in the plot in Figure 17.2 is the slight change in variance for the variable law. Based on this, we will investigate the potential effect of the law on the scale of the model:

## Warning: This type of model can only be applied to the data in logarithms.
## Amending the data

In the code above, we fit a scale model to the already estimated location model ETSX(M,N,M). We switch off the ETS element in the scale model and introduce the explanatory variable law. The function provides a warning that it will create a model in logarithms, which is what we want anyway. After estimating the model, we get the following fit (Figure 17.3).

Scale model fit

Figure 17.3: Scale model fit

Figure 17.3 demonstrates how the scale of the model changes over time. The part for the test set (after the vertical line) is not always informative, as the values there will depend on the performance of the location part of the model in the holdout. We can see that, when the law was introduced, the variance of the error term has increased slightly. To see if the change was substantial, we can analyse the summary of the model:

## Model estimated using sm.adam() function: Regression in logs
## Response variable: drivers
## Distribution used in the estimation: Normal
## Loss function type: likelihood; Loss function value: 1113.928
## Coefficients:
##             Estimate Std. Error Lower 2.5% Upper 97.5%  
## (Intercept)  -5.3047     0.0772    -5.4570     -5.1525 *
## law           0.1624     0.3122    -0.4537      0.7783  
## Error standard deviation: 1.0032
## Sample size: 180
## Number of estimated parameters: 2
## Number of degrees of freedom: 178
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 2231.857 2231.925 2238.243 2238.419

The output above shows that the uncertainty about the parameter law is high, covering negative, positive values and zero. This means that we cannot tell the difference between the value of the parameter for law and zero. So, the variable does not have a significant impact on the scale. We can also merge the location and scale models (implant the latter in the former) to see how the overall model would perform:

Having implanted the scale model in the initial one, we can now compare the information criteria to conclude, whether the introduction of additional complexity brings improvements:

##  Location Implanted 
##  2268.257  2270.607

As we see, the model with the implanted scale model has a higher AICc, which means that introducing the explanatory variable law in the scale model does not bring value and that the variable should not be used.

17.3.2 Example 2

The example in the previous Subsection shows how the analysis can be done for the scale model. To see how the scale model can be used for forecasting, we will consider another example with dynamic elements. For this example, we will use AirPassengers and an automatically selected ETS model with Inverse Gaussian distribution (for demonstration purposes):

The diagnostics of the model indicate that a time-varying scale might be required in this situation (see Figure 17.4) because the variance of the residuals seems to decrease in the period from 1954 – 1958.

Diagnostics of the location model

Figure 17.4: Diagnostics of the location model

In this situation, we will ask the function to select the best ETS model for the scale and will not use any other elements:

The sm function accepts roughly the same set of parameters as adam() when applied to the adam class objects. So, you can experiment with other types of ETS/ARIMA models if you want. Here is what the function has selected automatically for the scale model:

## **Scale Model**
## Time elapsed: 0.13 seconds
## Model estimated using sm.adam() function: ETS(MNN)
## Distribution assumed in the model: Inverse Gaussian
## Loss function type: likelihood; Loss function value: 474.332
## Persistence vector g:
##  alpha 
## 0.0704 
## Sample size: 134
## Number of estimated parameters: 2
## Number of degrees of freedom: 132
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 952.6640 952.7556 958.4596 958.6840 
## Forecast errors:
## ME: 0.004; MAE: 0.004; RMSE: 0.005
## sCE: 3549.663%; Asymmetry: 100%; sMAE: 354.966%; sMSE: 1991.897%
## MASE: 2.809; RMSSE: 2.261; rMAE: 1.267; rRMSE: 1.169

We see that the most suitable model for the location is ETS(M,N,N) with \(\alpha=0.0704\), which can be considered as GARCH(1,1) with restricted parameters (as discussed in Subsection 17.2): \[\begin{equation*} \sigma_t^2 = 0.9296 \sigma_{t-1}^2 + 0.0704 \epsilon_{t-1}^2 . \end{equation*}\] The model fits the data in the following way (see Figure 17.5):

The fit of the scale model.

Figure 17.5: The fit of the scale model.

Note that the information criteria in the summary of the model do not include the number of estimated parameters in the location part of the model, so they are not directly comparable with the ones from the initial model. So, to understand whether there is an improvement in comparison with the model with a fixed scale, we can, again, implant the scale model in the location and compare the new model with the initial one:

##  Location Implanted 
##  991.4883  990.6118

The output above shows that introducing the ETS(M,N,N) model for the scale has improved the model, reducing the AICc. After implanting the scale in the model, we can access it via adamMerged$scale and extract the predicted scale via extractScale(adamMerged) command in R. We can also do model diagnostics using the same principles as discussed in Chapter 14 (see Figure 17.6).

## Note that residuals diagnostics plots are produced for scale model
Diagnostics of the merged ADAM

Figure 17.6: Diagnostics of the merged ADAM

Note that in the case of a merged model, the plots that produce residuals will be done for the scale model rather than for the location (the plot function will produce a message on that), so plotting \(\eta_t\) instead of \(\epsilon_t\) (see discussion in Subsection 17.1.3 for the assumptions about the residual in this case). Figure 17.6 demonstrates the diagnostics for the constructed model. Given that we used the Inverse Gaussian distribution, the \(\eta_t^2 \sim \chi^2(1)\), which is why the QQ-plot mentions the Chi-Squared distribution. We can see that the distribution has a shorter tail than expected, which means that the model might be missing some elements. However, the log(standardised Residuals) vs Fitted and Time do not demonstrate any obvious problems except for several potential outliers, which do not lie far away from the bounds. Those specific observations could be investigated to determine whether the model can be improved further. We leave this exercise to the readers.

If we want to produce forecasts from the scale model, we can use the same methods previously used for the location model (Figure 17.7).

Forecast for the Scale model

Figure 17.7: Forecast for the Scale model

The point forecast of scale for the subsequent h observations can also be used to produce more appropriate prediction intervals for the merged model. This will be discussed in Chapter 18.