## 16.1 Covariance matrix of parameters

One of the basic conventional statistical ways of capturing uncertainty about estimates of parameters is via the calculation of the covariance matrix of parameters. The covariance matrix that is typically calculated in regression context is based on the assumption of normality and can be derived based on the maximum likelihood estimate of parameters. It relies on the “Fisher Information”, which in turn is calculated as a negative expectation of Hessian of parameters (the matrix of second derivatives of the likelihood function with respect to all the estimates of parameters). The idea of Hessian is to capture the curvatures of the likelihood function in its optimal point to understand what impact each of the parameters has on it. If we calculate the Hessian matrix and have Fisher Information, then using Cramer-Rao bound (Wikipedia, 2021n), the true variance of parameters will be greater or equal to the inverse of Fisher Information: \[\begin{equation} \mathrm{V}(\hat{\theta_j}) \geq \frac{1}{\mathrm{FI}(\hat{\theta_j})} , \tag{16.1} \end{equation}\] where \(\theta_j\) is the parameter under consideration. The property (16.1) can then be used for the calculation of the covariance matrix of parameters. While in case of regression this calculation has an analytical solution, in case of ETS and ARIMA, this can only be done via numeric methods, because the models rely on recursive relations.

In R, an efficient calculation of Hessian can be done via the `hessian()`

function from the `pracma`

package. In `smooth`

there is a method `vcov()`

that does all the calculations, estimating the negative Hessian inside the `adam()`

and then inverts the result. Here is an example of how this works:

```
adamETSBJ <- adam(BJsales, h=10, holdout=TRUE)
adamETSBJVcov <- vcov(adamETSBJ)
round(adamETSBJVcov, 3)
```

```
## alpha beta phi level trend
## alpha 0.013 -0.010 0.007 -0.206 0.167
## beta -0.010 0.019 -0.014 0.470 -0.376
## phi 0.007 -0.014 0.016 -0.482 0.389
## level -0.206 0.470 -0.482 22.287 -16.158
## trend 0.167 -0.376 0.389 -16.158 12.704
```

The precision of the estimate will depend on the closeness of the likelihood function to its maximum in the estimated parameters. If the likelihood was not properly maximised and the function stopped prematurely, then the covariance matrix might be incorrect and contain errors. Out of curiosity, we could calculate the correlation matrix of the estimated parameters:

```
## alpha beta phi level trend
## alpha 1.000 -0.657 0.467 -0.381 0.410
## beta -0.657 1.000 -0.803 0.714 -0.757
## phi 0.467 -0.803 1.000 -0.813 0.870
## level -0.381 0.714 -0.813 1.000 -0.960
## trend 0.410 -0.757 0.870 -0.960 1.000
```

This matrix does not provide helpful analytical information but demonstrates that the estimates of the initial level and trend of the ETS(AAdN) model applied to this data are negatively correlated. The values from the covariance matrix can then be used for various purposes, including calculation of confidence intervals of parameters, construction of confidence interval for the fitted value and point forecasts and finally, the construction of more adequate prediction intervals.

In some cases, the `vcov()`

method would complain that the Fisher Information cannot be inverted. This typically means that the `adam()`

failed to reach the maximum of the likelihood function. Re-estimating the model might resolve the problem.

*Remark*. This method only works when `loss="likelihood"`

or when the loss is aligned with the assumed distribution (e.g. `loss="MSE"`

and `distribution="dnorm"`

). In all the other cases, other approaches (such as bootstrap) would need to be used to estimate the covariance matrix of parameters.

### 16.1.1 Bootstrapped covariance matrix

An alternative way of constructing the matrix is via the bootstrap. The one implemented in `smooth`

is based on the `coefboostrap()`

method from the `greybox`

package, which implements the modified case resampling. It is less efficient than the Fisher Information method in terms of computational time and works only for larger samples. The algorithm implemented in the function uses a continuous sub-sample of the original data, starting from the initial point \(t=1\) (if backcasting is used, as discussed in Section 11.4.1, then the starting point will be allowed to vary). This sub-sample is then used to re-estimate `adam()`

to get the empirical estimates of parameters. The procedure is repeated `nsim`

times, which for `adam()`

is by default equal to 100. This approach is far from ideal and will typically lead to the underestimated variance of initials because of the sample size restrictions. Still, it does not break the data structure and allows obtaining results relatively fast without imposing any additional assumptions on the model and the data. I recommend using it in case of the initialisation via backcasting.

Here is an example of how the function works on the data above – it is possible to speed up the process by doing parallel calculations:

```
## Bootstrap for the adam model with nsim=100 and size=70
## Time elapsed: 0.43 seconds
```

The covariance matrix can then be extracted from the result via `adamETSBJBoot$vcov`

. The same procedure is used in `vcov()`

method if `bootstrap=TRUE`

:

```
## alpha beta phi level trend
## alpha 0.006 -0.001 -0.001 -0.001 -0.001
## beta -0.001 0.004 0.000 -0.010 0.019
## phi -0.001 0.000 0.010 0.046 -0.050
## level -0.001 -0.010 0.046 0.500 -0.691
## trend -0.001 0.019 -0.050 -0.691 1.030
```

### References

• Wikipedia, 2021n. Cramér–Rao bound. https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound (version: 2021-07-18)