## 10.6 Examples of application

For this example, we will use the data of Road Casualties in Great Britain 1969–84, Seatbelts dataset in datasets package for R, which contains several variables (the description is provided in the documentation for the data and can be accessed via ?Seatbelts command). The variable of interest, in this case, is drivers, and the dataset contains more variables than needed, so we will restrict the data with drivers, kms (distance driven), PetrolPrice and law – the latter three seem to influence the number of injured/killed drivers in principle:

SeatbeltsData <- Seatbelts[,c("drivers","kms","PetrolPrice","law")]

The dynamics of these variables over time is shown in Figure 10.1.

Apparently, the drivers variable exhibits seasonality but does not seem to have a trend. The type of seasonality is challenging to determine, but we will assume that it is multiplicative for now. A simple ETS(M,N,M) model applied to the data will produce the following (we will withhold the last 12 observations for the forecast evaluation, Figure 10.2):

adamETSMNMSeat <- adam(SeatbeltsData[,"drivers"], "MNM",
h=12, holdout=TRUE)
plot(forecast(adamETSMNMSeat, h=12, interval="prediction"))

This simple model already does a fine job fitting the data and producing forecasts. However, the forecast is biased and is lower than needed because of the sudden drop in the level of series, which can only be explained by the introduction of the new law in the UK in 1983, making the seatbelts compulsory for drivers. Due to the sudden drop, the smoothing parameter for the level of series is higher than needed, leading to wider intervals and less accurate forecasts. Here is the output of the model:

adamETSMNMSeat
## Time elapsed: 0.1 seconds
## Model estimated using adam() function: ETS(MNM)
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 1125.509
## Persistence vector g:
##  alpha  gamma
## 0.4133 0.0000
##
## Sample size: 180
## Number of estimated parameters: 15
## Number of degrees of freedom: 165
## Information criteria:
##      AIC     AICc      BIC     BICc
## 2281.019 2283.945 2328.913 2336.512
##
## Forecast errors:
## ME: 117.9; MAE: 117.9; RMSE: 137.596
## sCE: 83.695%; Asymmetry: 100%; sMAE: 6.975%; sMSE: 0.663%
## MASE: 0.684; RMSSE: 0.611; rMAE: 0.504; rRMSE: 0.542

In order to further explore the data we will produce the scatterplots and boxplots between the variables using spread() function from greybox package (Figure 10.3):

spread(SeatbeltsData)

Figure 10.3 shows a negative relation between kms and drivers: the higher the distance driven, the lower the total of car drivers killed or seriously injured. A similar relation is observed between the PetrolPrice and drivers (when the prices are high, people tend to drive less, thus causing fewer incidents). Interestingly, the increase of both variables causes the variance of the response variable to decrease (heteroscedasticity effect). Using a multiplicative error model and including the variables in logarithms, in this case, might address this potential issue. Note that we do not need to take the logarithm of drivers, as we already use the model with multiplicative error. Finally, the legislation of a new law seems to have caused a decrease in the number of causalities. To have a better model in terms of explanatory and predictive power, we should include all three variables. This is how we can do that using adam():

adamETSXMNMSeat <- adam(SeatbeltsData, "MNM", h=12, holdout=TRUE,
formula=drivers~log(kms)+log(PetrolPrice)+law)

The parameter formula in general is not compulsory. It can either be substituted by formula=drivers~. or dropped completely – the function would fit the model of the first variable in the matrix from everything else. We need it in our case because we introduce log-transformations of some explanatory variables.

plot(forecast(adamETSXMNMSeat, h=12, interval="prediction"))

Figure 10.4 shows the forecast from the second model, which is slightly more accurate. More importantly, the prediction interval is narrower than in the simple ETS(M,N,M) because now the model takes the external information into account. Here is the summary of the second model:

adamETSXMNMSeat
## Time elapsed: 0.39 seconds
## Model estimated using adam() function: ETSX(MNM)
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 1114.042
## Persistence vector g (excluding xreg):
##  alpha  gamma
## 0.2042 0.0000
##
## Sample size: 180
## Number of estimated parameters: 18
## Number of degrees of freedom: 162
## Information criteria:
##      AIC     AICc      BIC     BICc
## 2264.085 2268.333 2321.558 2332.589
##
## Forecast errors:
## ME: 96.28; MAE: 97.238; RMSE: 123.401
## sCE: 68.347%; Asymmetry: 97.1%; sMAE: 5.752%; sMSE: 0.533%
## MASE: 0.564; RMSSE: 0.548; rMAE: 0.416; rRMSE: 0.486

Note that the smoothing parameter $$\alpha$$ has reduced from 0.41 to 0.2. This led to the reduction in error measures. For example, based on MASE, we can conclude that the model with explanatory variables is more precise than the simple univariate ETS(M,N,M). Still, we could try introducing the update of the parameters for the explanatory variables to see how it works (it might be unnecessary for this data):

adamETSXMNMDSeat <- adam(SeatbeltsData, "MNM", h=12, holdout=TRUE,
formula=drivers~log(kms)+log(PetrolPrice)+law,
regressors="adapt")

In this specific case, the difference between the ETSX and ETSX{D} models is infinitesimal in terms of the accuracy of final forecasts and prediction intervals. Here is the output of the model:

adamETSXMNMDSeat
## Time elapsed: 0.42 seconds
## Model estimated using adam() function: ETSX(MNM){D}
## Distribution assumed in the model: Gamma
## Loss function type: likelihood; Loss function value: 1114.215
## Persistence vector g (excluding xreg):
##  alpha  gamma
## 0.1758 0.0000
##
## Sample size: 180
## Number of estimated parameters: 21
## Number of degrees of freedom: 159
## Information criteria:
##      AIC     AICc      BIC     BICc
## 2270.430 2276.278 2337.482 2352.666
##
## Forecast errors:
## ME: 98.52; MAE: 99.317; RMSE: 125.665
## sCE: 69.937%; Asymmetry: 97.4%; sMAE: 5.875%; sMSE: 0.553%
## MASE: 0.576; RMSSE: 0.558; rMAE: 0.425; rRMSE: 0.495

We can spot that the error measures of the dynamic model are a bit higher than the ones from the static one (e.g., compare MASE and RMSSE of models). In addition, the information criteria are slightly lower for the static model. So based on all of this, we should probably use the static one for forecasting and analytical purposes. To see the effect of the explanatory variables on the number of incidents with drivers, we can look at the parameters for those variables:

adamETSXMNMSeat$initial$xreg
##         log.kms. log.PetrolPrice.              law
##      -0.09554412      -0.30249416      -0.23809816

Based on that, we can conclude that the introduction of the law reduced on average the number of incidents by approximately 24%, while the increase of the petrol price by 1% leads on average to decrease in the number of incidents by 0.3%. Finally, the distance negatively impacts the incidents as well, reducing it on average by 0.1% for each 1% increase in the distance. This is the standard interpretation of parameters, which we can use based on the estimated model (see, for example, discussion in Section 8.3 of Svetunkov, 2022a). We will discuss how to do the analysis using ADAM in Chapter 16, introducing the standard errors and confidence intervals for the parameters.

Finally, adam() has some shortcuts when a matrix of variables is provided with no formula, assuming that the necessary expansion has already been done. This leads to the decrease in computational time of the function and becomes especially useful when working on large samples of data. Here is an example with ETSX(M,N,N):

# Create matrix for the model
SeatbeltsDataExpanded <-
ts(model.frame(drivers~log(kms)+log(PetrolPrice)+law,
SeatbeltsData),
start=start(SeatbeltsData), frequency=frequency(SeatbeltsData))
# Fix the names of variables
colnames(SeatbeltsDataExpanded) <-
make.names(colnames(SeatbeltsDataExpanded))
# Apply the model
lags=12, h=12, holdout=TRUE)