\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

16.2 Covariance matrix of parameters

One of the basic conventional statistical ways of capturing uncertainty about estimates of parameters is via the calculation of the covariance matrix of parameters. The covariance matrix that is typically calculated in regression context can be based either on OLS estimates of parameters, or on MLE assuming that the residuals of the model follow Normal distribution. In the latter case, it relies on the “Fisher Information”, which in turn is calculated as a negative expectation of Hessian of parameters (the matrix of second derivatives of the likelihood function with respect to all the estimates of parameters). The idea of Hessian is to capture the curvatures of the likelihood function in its optimal point to understand what impact each of the parameters has on the likelihood. If we calculate the Hessian matrix and have the Fisher Information, then using the Cramer-Rao bound (Rao, 1945), the true variance of the parameters will be greater than or equal to the inverse of the Fisher Information: \[\begin{equation} \mathrm{V}(\hat{\theta}_j) \geq \frac{1}{\mathrm{FI}(\hat{\theta}_j)} , \tag{16.1} \end{equation}\] where \(\theta_j\) is the parameter under consideration. The property (16.1) can then be used for the calculation of the covariance matrix of parameters. While in the case of the linear regression this calculation has an analytical solution, in cases of ETS and ARIMA, this can only be done via numeric methods, because the models rely on recursive relations and there is no closed analytical solution for the parameters of these models.

In R, an efficient calculation of Hessian can be done via the hessian() function from the pracma package (Borchers, 2023). In smooth there is a method vcov() that does all the calculations, estimating the negative Hessian inside the adam() and then inverting the result. Here is an example of how this works:

adamETSBJ <- adam(BJsales, h=10, holdout=TRUE)
adamETSBJVcov <- vcov(adamETSBJ)
round(adamETSBJVcov, 3)
##        alpha   beta    phi  level  trend
## alpha  0.012 -0.008  0.004 -0.019  0.034
## beta  -0.008  0.014 -0.007  0.043 -0.074
## phi    0.004 -0.007  0.009 -0.045  0.072
## level -0.019  0.043 -0.045  2.787 -1.844
## trend  0.034 -0.074  0.072 -1.844  3.061

The precision of the estimate will depend on the closeness of the likelihood function to its maximum in the estimated parameters. If the likelihood was not properly maximised and the function stopped prematurely, then the covariance matrix might be incorrect and contain errors. In that case, vcov() will produce a warning, saying that the resulting matrix might not be estimated correctly. The covariance matrix itself shows the variability of each of the parameters and whether they have any relations between them or not. In order to get a clearer picture about the latter, we could calculate the correlation matrix of the estimated parameters:

cov2cor(adamETSBJVcov) |>
    round(3)
##        alpha   beta    phi  level  trend
## alpha  1.000 -0.637  0.357 -0.105  0.181
## beta  -0.637  1.000 -0.669  0.218 -0.354
## phi    0.357 -0.669  1.000 -0.289  0.435
## level -0.105  0.218 -0.289  1.000 -0.631
## trend  0.181 -0.354  0.435 -0.631  1.000

This matrix demonstrates that the estimates of the initial level and trend of the ETS(A,Ad,N) model applied to this data are strongly negatively correlated. This means that on average with the increase of the initial level, the initial trend tends to be lower, which makes sense as a starting point of a model. The values from the covariance matrix can also be used, for example, for calculation of confidence intervals of parameters (Section 16.3), construction of confidence intervals for the fitted values and point forecasts (Section 16.5), and for the construction of more adequate prediction intervals (i.e. taking the uncertainty of estimates of parameters into account, see Subsection 18.3.6).

In some cases, the vcov() method would complain that the Fisher Information cannot be inverted. This typically means that the adam() failed to reach the maximum of the likelihood function. Re-estimating the model with different initial values and/or optimiser settings might resolve the problem (see Section 11.4).

Remark. This method only works when loss="likelihood" or when the loss is aligned with the assumed distribution (e.g. loss="MSE" and distribution="dnorm"). In all the other cases, other approaches (such as bootstrap) would need to be used to estimate the covariance matrix of parameters.

16.2.1 Bootstrapped covariance matrix

An alternative way of constructing the matrix is via the bootstrap. The one implemented in smooth is based on the coefbootstrap() method from the greybox package, which implements the modified case resampling. It is less efficient than the Fisher Information method in terms of computational time and works only for larger samples. The algorithm implemented in the function creates continuous sub-samples of the original data, starting from the initial point \(t=1\) (if backcasting is used, as discussed in Section 11.4.1, then the starting point will be allowed to vary). These sub-samples are then used for re-estimation of adam() to get the empirical estimates of parameters. The procedure is repeated nsim times, which for adam() is by default equal to 100. This approach is far from ideal and will typically lead to the underestimated variance of initials because of the sample size restrictions. Still, it does not break the data structure and allows obtaining results relatively fast without imposing any additional assumptions on the model and the data. I personally recommend using it in the case of the initialisation via backcasting.

Here is an example of how the function works on the data above – it is possible to speed up the process by doing parallel calculations:

adamETSBJBoot <- coefbootstrap(adamETSBJ, parallel=TRUE)
adamETSBJBoot
## Bootstrap for the adam model with nsim=100 and size=70
## Time elapsed: 0.48 seconds

The size in the output above refers to the sub-sample size, which by default is 75% of the original data length. The covariance matrix can then be extracted from the result via adamETSBJBoot$vcov. The same procedure is used in the vcov() method if bootstrap=TRUE:

adamETSBJ |>
    vcov(bootstrap=TRUE, parallel=TRUE) |>
    round(3)
##        alpha   beta    phi  level  trend
## alpha  0.042 -0.008  0.007  0.082 -0.008
## beta  -0.008  0.005 -0.003 -0.022 -0.002
## phi    0.007 -0.003  0.011  0.035  0.018
## level  0.082 -0.022  0.035  0.606 -0.144
## trend -0.008 -0.002  0.018 -0.144  0.142

References

• Borchers, H.W., 2023. Pracma: Practical numerical math functions. https://CRAN.R-project.org/package=pracma R package version 2.4.4
• Rao, R.C., 1945. Information and Accuracy Attainable in the Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematical Society. 37, 81–91.