This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

16.1 Covariance matrix of parameters

One of the basic conventional statistical ways of capturing uncertainty about estimates of parameters is via the calculation of covariance matrix of parameters. The covariance matrix that is typically calculated in regression context is based on the assumption of normality and can be derived base don maximum likelihood estimates of parameters. It relies on the “Fisher Information,” which in turn is calculated as a negative expectation of Hessian of parameters (the matrix of second derivatives of likelihood function with respect to all the estimates of parameters). The idea of Hessian is to capture the curvatures of the likelihood function in its optimal point in order to understand what impact each of parameters has on it. If we calculate the Hessian matrix and have Fisher Information, then using Cramer-Rao bound (Wikipedia, 2021n), the true variance of parameters will be greater or equal to the inverse of Fisher Information: \[\begin{equation} \mathrm{V}(\hat{\theta_j}) \geq \frac{1}{\mathrm{FI}(\hat{\theta_j})} , \tag{16.1} \end{equation}\] where \(\theta_j\) is the parameter under consideration. The property (16.1) can then be used for the calculation of the covariance matrix of parameters. While in case with regression this calculation has an analytical solution, in case of ETS and ARIMA, this can only be done via numeric methods, because the models rely on recursive relations.

In R, an efficient calculation of Hessian can be done via hessian() function from pracma package. There is a method vcov() that does all the calculations, estimating the negative Hessian inside adam() and then inverts the result. Here is an example of how this works:

adamModel <- adam(BJsales, h=10, holdout=TRUE)
adamModelVcov <- vcov(adamModel)
##              alpha        beta          phi       level       trend
## alpha  0.013096202 -0.01048869  0.006719673  -0.2059952   0.1670392
## beta  -0.010488693  0.01945175 -0.014069755   0.4700175  -0.3762878
## phi    0.006719673 -0.01406976  0.015788013  -0.4823944   0.3894840
## level -0.205995198  0.47001753 -0.482394423  22.2865424 -16.1581910
## trend  0.167039229 -0.37628782  0.389483965 -16.1581910  12.7038582

The precision of estimate will depend on the closeness of the likelihood function to its maximum in the estimated parameters. If the likelihood was not properly maximised, and the function stopped prematurely, then the covariance matrix might be incorrect and contain errors. Our of curiosity, we could calculate the correlation matrix of the estimated parameters:

adamModelVcov / sqrt(diag(adamModelVcov) %*% t(diag(adamModelVcov)))
##            alpha       beta        phi      level      trend
## alpha  1.0000000 -0.6571573  0.4673172 -0.3812967  0.4095227
## beta  -0.6571573  1.0000000 -0.8028667  0.7138604 -0.7569600
## phi    0.4673172 -0.8028667  1.0000000 -0.8132371  0.8696771
## level -0.3812967  0.7138604 -0.8132371  1.0000000 -0.9602926
## trend  0.4095227 -0.7569600  0.8696771 -0.9602926  1.0000000

This matrix does not provide much useful analytical information, but demonstrates that the estimates of initial level and trend of the ETS(AAdN) model applied to this data are negatively correlated. The values from this matrix can then be used for a variety of purposes, including calculation of confidence intervals of parameters, construction of confidence interval for the fitted value and point forecasts and finally the construction of more adequate prediction intervals.

In some cases the vcov() method would complain that the Fisher Information cannot be inverted. This typically means that the adam() failed to reach the maximum of the likelihood function

Note that this method only works, when loss="likelihood" or when the loss is aligned with the assumed distribution (e.g. loss="MSE" and distribution="dnorm"). In all the other cases, other approaches (such as bootstrap) would need to be used for the estimation of the covariance matrix of parameters.

16.1.1 Bootstrapped covariance matrix

An alternative way of constructing the matrix is via the bootstrap. The one that is implemented in smooth is based on coefboostrap() method from greybox package, which implements the modified case resampling. It is less efficient than the Fisher Information method in terms of computational time and works only for larger samples. The algorithm implemented in the function uses a continuous sub-sample of the original data, starting from the original point \(t=1\) (if backcasting is used (see Section 11.4.1), then the starting point will be allowed to differ). This sub-sample is then used for the re-estimation of adam() in order to get the empirical estimates of parameters. The procedure is repeated nsim times, which for adam() is by default equal to 100. This approach is far from ideal and will typically lead to underestimated variance of initials, but it does not break the structure of the data and allows obtaining relatively fast results without imposing any additional assumptions on the model and the data. I personally recommend using it in case of the initialisation via backcasting.

Here is an example of how the function work on the data above - it is possible to speed up the process by doing parallel calculations:

adamModelBoot <- coefbootstrap(adamModel,parallel=TRUE)
## Bootstrap for the adam model with nsim=100 and size=70
## Time elapsed: 0.52 seconds

The covariance matrix can then be extracted from the result via adamModelBoot$vcov. The same procedure is used in vcov() method if bootstrap=TRUE:

vcov(adamModel, bootstrap=TRUE, parallel=TRUE)
##              alpha        beta          phi       level        trend
## alpha  0.048892321 0.009421037  0.008364942  0.04899050 -0.058432024
## beta   0.009421037 0.009231592  0.002021305  0.00318393  0.005407812
## phi    0.008364942 0.002021305  0.012833455  0.06033849 -0.063433597
## level  0.048990496 0.003183930  0.060338495  0.54061801 -0.721953121
## trend -0.058432024 0.005407812 -0.063433597 -0.72195312  1.053705430


• Wikipedia, 2021n. Cramér–Rao bound. (version: 2021-07-18)