\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

16.2 Covariance matrix of parameters

One of the basic conventional statistical ways of capturing uncertainty about estimates of parameters is via the calculation of the covariance matrix of parameters. The covariance matrix that is typically calculated in regression context can be based either on OLS estimates of parameters, or on MLE assuming that the residuals of the model follow Normal distribution. In the latter case, it relies on the “Fisher Information”, which in turn is calculated as a negative expectation of Hessian of parameters (the matrix of second derivatives of the likelihood function with respect to all the estimates of parameters). The idea of Hessian is to capture the curvatures of the likelihood function in its optimal point to understand what impact each of the parameters has on the likelihood. If we calculate the Hessian matrix and have the Fisher Information, then using the Cramer-Rao bound (Rao, 1945), the true variance of the parameters will be greater than or equal to the inverse of the Fisher Information: \[\begin{equation} \mathrm{V}(\hat{\theta}_j) \geq \frac{1}{\mathrm{FI}(\hat{\theta}_j)} , \tag{16.1} \end{equation}\] where \(\theta_j\) is the parameter under consideration. The property (16.1) can then be used for the calculation of the covariance matrix of parameters. While in the case of the linear regression this calculation has an analytical solution, in cases of ETS and ARIMA, this can only be done via numeric methods, because the models rely on recursive relations and there is no closed analytical solution for the parameters of these models.

In R, an efficient calculation of Hessian can be done via the hessian() function from the pracma package (Borchers, 2023). In smooth there is a method vcov() that does all the calculations, estimating the negative Hessian inside the adam() and then inverting the result. Here is an example of how this works:

adamETSBJ <- adam(BJsales, h=10, holdout=TRUE)
adamETSBJVcov <- vcov(adamETSBJ)
round(adamETSBJVcov, 3)
##        alpha   beta    phi  level  trend
## alpha  0.039 -0.070  0.040  0.229 -0.204
## beta  -0.070  0.158 -0.091 -0.557  0.497
## phi    0.040 -0.091  0.057  0.347 -0.309
## level  0.229 -0.557  0.347 13.508 -9.782
## trend -0.204  0.497 -0.309 -9.782  8.075

The precision of the estimate will depend on the closeness of the likelihood function to its maximum in the estimated parameters. If the likelihood was not properly maximised and the function stopped prematurely, then the covariance matrix might be incorrect and contain errors. In that case, vcov() will produce a warning, saying that the resulting matrix might not be estimated correctly. The covariance matrix itself shows the variability of each of the parameters and whether they have any relations between them or not. In order to get a clearer picture about the latter, we could calculate the correlation matrix of the estimated parameters:

cov2cor(adamETSBJVcov) |>
    round(3)
##        alpha   beta    phi  level  trend
## alpha  1.000 -0.889  0.838  0.316 -0.363
## beta  -0.889  1.000 -0.958 -0.381  0.440
## phi    0.838 -0.958  1.000  0.396 -0.455
## level  0.316 -0.381  0.396  1.000 -0.937
## trend -0.363  0.440 -0.455 -0.937  1.000

This matrix demonstrates that the estimates of the initial level and trend of the ETS(A,Ad,N) model applied to this data are strongly negatively correlated. This means that on average with the increase of the initial level, the initial trend tends to be lower, which makes sense as a starting point of a model. The values from the covariance matrix can also be used, for example, for calculation of confidence intervals of parameters (Section 16.3), construction of confidence intervals for the fitted values and point forecasts (Section 16.5), and for the construction of more adequate prediction intervals (i.e. taking the uncertainty of estimates of parameters into account, see Subsection 18.3.6).

In some cases, the vcov() method would complain that the Fisher Information cannot be inverted. This typically means that the adam() failed to reach the maximum of the likelihood function. Re-estimating the model with different initial values and/or optimiser settings might resolve the problem (see Section 11.4).

Remark. This method only works when loss="likelihood" or when the loss is aligned with the assumed distribution (e.g. loss="MSE" and distribution="dnorm"). In all the other cases, other approaches (such as bootstrap) would need to be used to estimate the covariance matrix of parameters.

16.2.1 Bootstrapped covariance matrix

An alternative way of constructing the matrix is via the bootstrap. The one implemented in smooth is based on the coefbootstrap() method from the greybox package, which uses the modification of the Maximum Entropy Bootstrap of Vinod and López-de-Lacalle (2009), implemented in the dsrboot() function of the greybox package (this is called “Data Shape Replication bootstrap”). It is less efficient than the Fisher Information method in terms of computational time. The algorithm recreates the original time series many times and then re-estimates adam() on each one of them to get the empirical estimates of parameters. The procedure is repeated nsim times, which for adam() is by default equal to 100. The algorithm does not break the data structure and it allows obtaining results relatively fast without imposing any additional assumptions on the model and the data.

Here is an example of how the function works on the data above – it is possible to speed up the process by doing parallel calculations:

adamETSBJBoot <- coefbootstrap(adamETSBJ, parallel=TRUE)
adamETSBJBoot
## Bootstrap for the adam model with nsim=1000 using dsr method
## Time elapsed: 2.57 seconds

The covariance matrix can then be extracted from the result via adamETSBJBoot$vcov. The same procedure is used in the vcov() method if bootstrap=TRUE:

adamETSBJ |>
    vcov(bootstrap=TRUE, parallel=TRUE) |>
    round(3)
##        alpha   beta    phi  level  trend
## alpha  0.029  0.013  0.002 -0.020 -0.008
## beta   0.013  0.089 -0.025  0.167 -0.031
## phi    0.002 -0.025  0.017 -0.073  0.005
## level -0.020  0.167 -0.073  8.774 -0.212
## trend -0.008 -0.031  0.005 -0.212  0.248

References

• Borchers, H.W., 2023. Pracma: Practical numerical math functions. https://CRAN.R-project.org/package=pracma R package version 2.4.4
• Rao, R.C., 1945. Information and Accuracy Attainable in the Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematical Society. 37, 81–91.
• Vinod, H.D., López-de-Lacalle, J., 2009. Maximum entropy bootstrap for time series: The meboot R package. Journal of Statistical Software. 29, 1–19. https://doi.org/10.18637/jss.v029.i05