16.1 Covariance matrix of parameters

One of the basic conventional statistical ways of capturing uncertainty about estimates of parameters is via the calculation of the covariance matrix of parameters. The covariance matrix that is typically calculated in regression context is based on the assumption of normality and can be derived based on the maximum likelihood estimate of parameters. It relies on the “Fisher Information”, which in turn is calculated as a negative expectation of Hessian of parameters (the matrix of second derivatives of the likelihood function with respect to all the estimates of parameters). The idea of Hessian is to capture the curvatures of the likelihood function in its optimal point to understand what impact each of the parameters has on it. If we calculate the Hessian matrix and have Fisher Information, then using Cramer-Rao bound (Wikipedia, 2021n), the true variance of parameters will be greater or equal to the inverse of Fisher Information: \[\begin{equation} \mathrm{V}(\hat{\theta_j}) \geq \frac{1}{\mathrm{FI}(\hat{\theta_j})} , \tag{16.1} \end{equation}\] where \(\theta_j\) is the parameter under consideration. The property (16.1) can then be used for the calculation of the covariance matrix of parameters. While in case of regression this calculation has an analytical solution, in case of ETS and ARIMA, this can only be done via numeric methods, because the models rely on recursive relations.

In R, an efficient calculation of Hessian can be done via the hessian() function from the pracma package. In smooth there is a method vcov() that does all the calculations, estimating the negative Hessian inside the adam() and then inverts the result. Here is an example of how this works:

adamETSBJ <- adam(BJsales, h=10, holdout=TRUE)
adamETSBJVcov <- vcov(adamETSBJ)
round(adamETSBJVcov, 3)
##        alpha   beta    phi   level   trend
## alpha  0.013 -0.010  0.007  -0.206   0.167
## beta  -0.010  0.019 -0.014   0.470  -0.376
## phi    0.007 -0.014  0.016  -0.482   0.389
## level -0.206  0.470 -0.482  22.287 -16.158
## trend  0.167 -0.376  0.389 -16.158  12.704

The precision of the estimate will depend on the closeness of the likelihood function to its maximum in the estimated parameters. If the likelihood was not properly maximised and the function stopped prematurely, then the covariance matrix might be incorrect and contain errors. Out of curiosity, we could calculate the correlation matrix of the estimated parameters:

round(cov2cor(adamETSBJVcov), 3)
##        alpha   beta    phi  level  trend
## alpha  1.000 -0.657  0.467 -0.381  0.410
## beta  -0.657  1.000 -0.803  0.714 -0.757
## phi    0.467 -0.803  1.000 -0.813  0.870
## level -0.381  0.714 -0.813  1.000 -0.960
## trend  0.410 -0.757  0.870 -0.960  1.000

This matrix does not provide helpful analytical information but demonstrates that the estimates of the initial level and trend of the ETS(AAdN) model applied to this data are negatively correlated. The values from the covariance matrix can then be used for various purposes, including calculation of confidence intervals of parameters, construction of confidence interval for the fitted value and point forecasts and finally, the construction of more adequate prediction intervals.

In some cases, the vcov() method would complain that the Fisher Information cannot be inverted. This typically means that the adam() failed to reach the maximum of the likelihood function. Re-estimating the model might resolve the problem.

Remark. This method only works when loss="likelihood" or when the loss is aligned with the assumed distribution (e.g. loss="MSE" and distribution="dnorm"). In all the other cases, other approaches (such as bootstrap) would need to be used to estimate the covariance matrix of parameters.

16.1.1 Bootstrapped covariance matrix

An alternative way of constructing the matrix is via the bootstrap. The one implemented in smooth is based on the coefbootstrap() method from the greybox package, which implements the modified case resampling. It is less efficient than the Fisher Information method in terms of computational time and works only for larger samples. The algorithm implemented in the function uses a continuous sub-sample of the original data, starting from the initial point \(t=1\) (if backcasting is used, as discussed in Section 11.4.1, then the starting point will be allowed to vary). This sub-sample is then used to re-estimate adam() to get the empirical estimates of parameters. The procedure is repeated nsim times, which for adam() is by default equal to 100. This approach is far from ideal and will typically lead to the underestimated variance of initials because of the sample size restrictions. Still, it does not break the data structure and allows obtaining results relatively fast without imposing any additional assumptions on the model and the data. I recommend using it in case of the initialisation via backcasting.

Here is an example of how the function works on the data above – it is possible to speed up the process by doing parallel calculations:

adamETSBJBoot <- coefbootstrap(adamETSBJ, parallel=TRUE)
adamETSBJBoot
## Bootstrap for the adam model with nsim=100 and size=70
## Time elapsed: 0.41 seconds

The covariance matrix can then be extracted from the result via adamETSBJBoot$vcov. The same procedure is used in vcov() method if bootstrap=TRUE:

round(vcov(adamETSBJ, bootstrap=TRUE, parallel=TRUE), 3)
##        alpha  beta    phi  level  trend
## alpha  0.053 0.010  0.009  0.051 -0.060
## beta   0.010 0.011  0.003  0.003  0.014
## phi    0.009 0.003  0.012  0.054 -0.056
## level  0.051 0.003  0.054  0.523 -0.711
## trend -0.060 0.014 -0.056 -0.711  1.059

References

• Wikipedia, 2021n. Cramér–Rao bound. https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound (version: 2021-07-18)