## 17.3 Example in R

### 17.3.1 Example 1

To demonstrate how the scale model works, we will use the model that we constructed in Section 10.6 (we use Normal distribution for simplicity):

```
adamLocationSeat <- adam(SeatbeltsData, "MNM", h=12, holdout=TRUE,
formula=drivers~log(kms)+log(PetrolPrice)+law,
distribution="dnorm")
```

To see if the scale model is needed in this case, we will produce several diagnostics plots (Figure 17.1).

Figure 17.1 shows how the residuals change with: (1) change of fitted values, (2) change of time. The first plot shows that there might be a slight change in the variability of residuals, but it is not apparent. The second plot does not demonstrate any significant changes in variance over time. This means that there might be some factors impacting the variance of residuals, but it does not exhibit GARCH-style dynamics. To investigate this further, we can plot explanatory variables vs the residuals (Figure 17.2).

The only slightly alarming element in the plot in Figure 17.2 is the slight change in variance for the variable `law`

. Based on this, we will investigate the potential effect of the law on the scale of the model:

```
## Warning: This type of model can only be applied to the data in logarithms.
## Amending the data
```

In the code above, we fit a scale model to the already estimated location model ETSX(M,N,M). We switch off the ETS element in the scale model and introduce the explanatory variable `law`

. The function provides a warning that it will create a model in logarithms, which is what we want anyway. After estimating the model, we get the following fit (Figure 17.3).

Figure 17.3 demonstrates how the scale of the model changes over time. The part for the test set (after the vertical line) is not always informative, as the values there will depend on the performance of the location part of the model in the holdout. We can see that, when the law was introduced, the variance of the error term has increased slightly. To see if the change was substantial, we can analyse the summary of the model:

```
##
## Model estimated using sm.adam() function: Regression in logs
## Response variable: drivers
## Distribution used in the estimation: Normal
## Loss function type: likelihood; Loss function value: 1113.928
## Coefficients:
## Estimate Std. Error Lower 2.5% Upper 97.5%
## (Intercept) -5.3047 0.0772 -5.4570 -5.1525 *
## law 0.1624 0.3122 -0.4537 0.7783
##
## Error standard deviation: 1.0032
## Sample size: 180
## Number of estimated parameters: 2
## Number of degrees of freedom: 178
## Information criteria:
## AIC AICc BIC BICc
## 2231.857 2231.925 2238.243 2238.419
```

The output above shows that the uncertainty about the parameter `law`

is high, covering negative, positive values and zero. This means that we cannot tell the difference between the value of the parameter for `law`

and zero. So, the variable does not have a significant impact on the scale. We can also merge the location and scale models (implant the latter in the former) to see how the overall model would perform:

Having implanted the scale model in the initial one, we can now compare the information criteria to conclude, whether the introduction of additional complexity brings improvements:

```
## Location Implanted
## 2268.257 2270.607
```

As we see, the model with the implanted scale model has a higher AICc, which means that introducing the explanatory variable `law`

in the scale model does not bring value and that the variable should not be used.

### 17.3.2 Example 2

The example in the previous Subsection shows how the analysis can be done for the scale model. To see how the scale model can be used for forecasting, we will consider another example with dynamic elements. For this example, we will use `AirPassengers`

and an automatically selected ETS model with Inverse Gaussian distribution (for demonstration purposes):

The diagnostics of the model indicate that a time-varying scale might be required in this situation (see Figure 17.4) because the variance of the residuals seems to decrease in the period from 1954 – 1958.

In this situation, we will ask the function to select the best ETS model for the scale and will not use any other elements:

The `sm`

function accepts roughly the same set of parameters as `adam()`

when applied to the `adam`

class objects. So, you can experiment with other types of ETS/ARIMA models if you want. Here is what the function has selected automatically for the scale model:

```
## **Scale Model**
## Time elapsed: 0.13 seconds
## Model estimated using sm.adam() function: ETS(MNN)
## Distribution assumed in the model: Inverse Gaussian
## Loss function type: likelihood; Loss function value: 474.332
## Persistence vector g:
## alpha
## 0.0704
##
## Sample size: 134
## Number of estimated parameters: 2
## Number of degrees of freedom: 132
## Information criteria:
## AIC AICc BIC BICc
## 952.6640 952.7556 958.4596 958.6840
##
## Forecast errors:
## ME: 0.004; MAE: 0.004; RMSE: 0.005
## sCE: 3549.663%; Asymmetry: 100%; sMAE: 354.966%; sMSE: 1991.897%
## MASE: 2.809; RMSSE: 2.261; rMAE: 1.267; rRMSE: 1.169
```

We see that the most suitable model for the location is ETS(M,N,N) with \(\alpha=0.0704\), which can be considered as GARCH(1,1) with restricted parameters (as discussed in Subsection 17.2): \[\begin{equation*} \sigma_t^2 = 0.9296 \sigma_{t-1}^2 + 0.0704 \epsilon_{t-1}^2 . \end{equation*}\] The model fits the data in the following way (see Figure 17.5):

Note that the information criteria in the summary of the model do not include the number of estimated parameters in the location part of the model, so they are not directly comparable with the ones from the initial model. So, to understand whether there is an improvement in comparison with the model with a fixed scale, we can, again, implant the scale model in the location and compare the new model with the initial one:

```
adamMerged <- implant(adamLocationAir, adamScaleAir)
setNames(c(AICc(adamLocationAir),AICc(adamMerged)),
c("Location","Implanted"))
```

```
## Location Implanted
## 991.4883 990.6118
```

The output above shows that introducing the ETS(M,N,N) model for the scale has improved the model, reducing the AICc. After implanting the scale in the model, we can access it via `adamMerged$scale`

and extract the predicted scale via `extractScale(adamMerged)`

command in R. We can also do model diagnostics using the same principles as discussed in Chapter 14 (see Figure 17.6).

`## Note that residuals diagnostics plots are produced for scale model`

Note that in the case of a merged model, the plots that produce residuals will be done for the scale model rather than for the location (the plot function will produce a message on that), so plotting \(\eta_t\) instead of \(\epsilon_t\) (see discussion in Subsection 17.1.3 for the assumptions about the residual in this case). Figure 17.6 demonstrates the diagnostics for the constructed model. Given that we used the Inverse Gaussian distribution, the \(\eta_t^2 \sim \chi^2(1)\), which is why the QQ-plot mentions the Chi-Squared distribution. We can see that the distribution has a shorter tail than expected, which means that the model might be missing some elements. However, the log(standardised Residuals) vs Fitted and Time do not demonstrate any obvious problems except for several potential outliers, which do not lie far away from the bounds. Those specific observations could be investigated to determine whether the model can be improved further. We leave this exercise to the readers.

If we want to produce forecasts from the scale model, we can use the same methods previously used for the location model (Figure 17.7).

The point forecast of scale for the subsequent `h`

observations can also be used to produce more appropriate prediction intervals for the merged model. This will be discussed in Chapter 18.