**Open Review**. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

## 15.3 Explanatory variables selection

There are different approaches for automatic variables selection, but not all of them are efficient in the context of dynamic models. For example, backward stepwise might be either not feasible in case of small samples or may take too much time to converge to an optimal solution (it has polynomial computational time). This is because the ADAMX model needs to be refitted and reestimated over and over again using recursive relations based, for example, on the state space model (11.3). The classical stepwise forward might also be too slow, because it has polynomial computational time. So, there need to be some simplifications, which will make variables selection in ADAMX doable in a reasonable time.

In order to make the mechanism doable in a limitted time, we rely on Sagaert and Svetunkov (2021) approach of stepwise trace forward selection of variables. It is the approach that uses the partial correlations between variables in order to identify, which of the variables to include on each iteration, and has because of that linear computational time. Still, doing that in the proper ADAMX would take more time than needed, so one of the possibles solutions is to do variables selection in ADAMX in the following steps:

- Estimate and fit the ETS model;
- Extract the residuals of the ETS model;
- Select the most suitable variables, explaining the residuals, based on an information criterion;
- Estimate the ADAMX model with the selected explanatory variables.

The residuals in step (2) might vary from model to model, depending on the type of the error term and the selected distribution:

- Normal, Laplace, S, Generalised Normal or Asymmetric Laplace: \(e_t\);
- Additive error and Log Normal, Inverse Gaussian or Gamma: \(\left(1+\frac{e_t}{\hat{y}_t} \right)\);
- Multiplicative error and Log Normal, Inverse Gaussian or Gamma: \(1+e_t\).

So, the extracted residuals should be formulated based on the distributional assumptions of each model.

In R, step (3) is done using the `stepwise()`

function from `greybox`

package, which supports all the distributions discussed in the previous chapters.

While the suggested approach has obvious limitations (e.g. smoothing parameters can be higher than needed, explaining the variability otherwise explained by variables), it is efficient in terms of computational time.

In order to see how it works, we use SeatBelt data:

We have already had a look at this data earlier, so we can move directly to the selection part:

```
adamModelETSXMNMSelect <- adam(SeatbeltsData,"MNM",h=12,holdout=TRUE,regressors="select")
plot(forecast(adamModelETSXMNMSelect,h=12,interval="prediction"))
```

```
## Warning: Observed Fisher Information is not positive semi-definite, which means
## that the likelihood was not maximised properly. Consider reestimating the model,
## tuning the optimiser or using bootstrap via bootstrap=TRUE.
```

```
##
## Model estimated using adam() function: ETSX(MNM)
## Response variable: drivers
## Distribution used in the estimation: Inverse Gaussian
## Loss function type: likelihood; Loss function value: 1118.492
## Coefficients:
## Estimate Std. Error Lower 2.5% Upper 97.5%
## alpha 0.2900 0.0804 0.1311 0.4486 *
## gamma 0.0077 0.0380 0.0000 0.0827
## level 1660.1176 105.0583 1452.6763 1867.3075 *
## seasonal_1 1.0002 0.0130 0.9744 1.0395 *
## seasonal_2 0.9084 0.0152 0.8827 0.9478 *
## seasonal_3 0.9458 0.0139 0.9201 0.9851 *
## seasonal_4 0.8674 0.0160 0.8417 0.9067 *
## seasonal_5 0.9465 0.0173 0.9208 0.9859 *
## seasonal_6 0.9157 0.0166 0.8899 0.9550 *
## seasonal_7 0.9597 0.0175 0.9339 0.9990 *
## seasonal_8 0.9669 0.0176 0.9411 1.0062 *
## seasonal_9 1.0039 0.0183 0.9781 1.0432 *
## seasonal_10 1.0882 0.0199 1.0625 1.1275 *
## seasonal_11 1.1985 0.0193 1.1728 1.2378 *
## law 0.0200 0.0904 -0.1585 0.1983
##
## Sample size: 180
## Number of estimated parameters: 16
## Number of degrees of freedom: 164
## Information criteria:
## AIC AICc BIC BICc
## 2268.984 2272.321 2320.071 2328.736
```

Note that the function might complain about the observed Fisher Information. This only means that the estimated variances of parameters might be lower than they should be in reality.

Based on the summary from the model, we can see that neither `kms`

, nor `PetrolPrice`

improve the model in terms of AICc. We could check them manually in order to see if the selection worked out well in our case (construct sink regression as a benchmark):

```
## Warning: Observed Fisher Information is not positive semi-definite, which means
## that the likelihood was not maximised properly. Consider reestimating the model,
## tuning the optimiser or using bootstrap via bootstrap=TRUE.
```

```
##
## Model estimated using adam() function: ETSX(MNM)
## Response variable: drivers
## Distribution used in the estimation: Inverse Gaussian
## Loss function type: likelihood; Loss function value: 1123.832
## Coefficients:
## Estimate Std. Error Lower 2.5% Upper 97.5%
## alpha 0.1858 0.0626 0.0621 0.3093 *
## gamma 0.0905 0.0232 0.0448 0.1361 *
## level 1351.3200 270.9388 816.2930 1885.6160 *
## seasonal_1 1.0923 0.0348 1.0525 1.1793 *
## seasonal_2 0.9892 0.0201 0.9494 1.0762 *
## seasonal_3 0.9649 0.0354 0.9252 1.0519 *
## seasonal_4 0.8687 0.0328 0.8289 0.9557 *
## seasonal_5 0.9481 0.0358 0.9084 1.0351 *
## seasonal_6 0.8910 0.0329 0.8513 0.9780 *
## seasonal_7 0.8814 0.0262 0.8416 0.9684 *
## seasonal_8 0.8978 0.0267 0.8580 0.9848 *
## seasonal_9 0.9422 0.0283 0.9024 1.0292 *
## seasonal_10 1.0520 0.0365 1.0123 1.1390 *
## seasonal_11 1.2679 0.0441 1.2282 1.3549 *
## kms 0.0000 0.0000 0.0000 0.0000
## PetrolPrice 0.0915 1.1721 -2.2230 2.4029
## law 0.0206 0.1191 -0.2145 0.2553
##
## Sample size: 180
## Number of estimated parameters: 18
## Number of degrees of freedom: 162
## Information criteria:
## AIC AICc BIC BICc
## 2283.664 2287.912 2341.137 2352.168
```

We can see that the sink regression model has a higher AICc value than the model with the selected variables, which means that the latter is closer to the “true model”. While `adamModelETSXMNMSelect`

might not be the best possible model in terms of information criteria, it is still a reasonable one and allows making different decisions. For example, we can see from the summary of the model that the introduction of law has reduced the number of accidents with drivers by approximately 23.79%. However, this is an average effect, and the true one lies somewhere between -34.06% and -13.53% (with the 95% confidence).

### References

Sagaert, Yves R., and Ivan Svetunkov. 2021. “Variables Selection Using Partial Correlations and Information Criteria.” Department of Management Science, Lancaster University.