16.4 Conditional variance with uncertain parameters

\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

Now that we have discussed how the covariance matrix of parameters and confidence intervals for parameters can be generated in ADAM, we can move to the discussion of propagating the effect of uncertainty of parameters to the states, fitted values, and forecasts. I consider two special cases with pure additive state space models:

When the values of the initial state vector are estimated;
When the model parameters (e.g. smoothing or AR/MA parameters) are estimated.

I discuss analytical formulae for the conditional variance for these cases. This variance can then be used to construct the confidence interval of the fitted line and/or for the confidence/prediction interval for the holdout period. I do not cover the more realistic case when both initials and parameters are estimated because there is no closed analytical form for this due to potential correlations between the estimates of parameters. Furthermore, there are no closed forms for the conditional variance for the multiplicative and mixed models, which is why I focus my explanation on the pure additive ones only.

16.4.1 Estimated initial state

First, we need to recall the recursive relations discussed in Section 5.4, specifically formula (5.10). Just to simplify all the derivations in this section, we consider the non-seasonal case, in which all elements of \(\boldsymbol{l}\) are equal to one. This can be ETS(A,N,N), ETS(A,A,N), ETS(A,Ad,N), or some ARIMA models.

Remark. The more general case is more complicated but is derivable using the same principles as discussed below.

The recursive relation from the first observation till some observation \(t\) can be written as: \[\begin{equation} \hat{\mathbf{v}}_{t} = \mathbf{D}^{t} \hat{\mathbf{v}}_{0} + \sum_{j=0}^{t-1} \mathbf{D}^{j} {\mathbf{g}} y_{t-j} , \tag{16.3} \end{equation}\] where \(\mathbf{D}=\mathbf{F} -\mathbf{g}\mathbf{w}^\prime\). The formula (16.3) shows that the most recent value of the state vector depends on the initial value \(\hat{\mathbf{v}}_{0}\) and on the linear combination of actual values.

Remark. We assume in this part that the matrix \(\mathbf{D}\) is known, i.e. the smoothing parameters are not estimated. Although this is an unrealistic assumption, it helps in showing how the variance of the initial state would influence the conditional variance of actual values at the end of the sample.

If we now take the variance of state vector conditional on the previous actual values \(y_{t-j}\) for all \(j=\{0, \dots, t-1 \}\), then we will have (due to the independence of two terms in (16.3)): \[\begin{equation} \mathrm{V}(\hat{\mathbf{v}}_{t} | y_1, y_2, \dots y_t) = \mathrm{V}\left( \mathbf{D}^{t} \hat{\mathbf{v}}_{0} \right) + \mathrm{V}\left(\sum_{j=0}^{t-1} \mathbf{D}^{j} y_{t-j} | y_1, y_2, \dots y_t \right) . \tag{16.4} \end{equation}\] We condition the variance on actual values in the formula above because they are given to us, and we want to see how different initial states would lead to the changes in the model fit given these values and thus how the uncertainty will propagate from \(j=1\) to \(j=t\). In the formula (16.4), the right-hand side is equal to zero because all actual values are known, and \(\mathbf{D}\) does not have any uncertainty due to the assumption above. This leads to the following covariance matrix of states on observation \(t\): \[\begin{equation} \mathrm{V}(\hat{\mathbf{v}}_{t} | y_1, y_2, \dots y_t) = \mathbf{D}^{t} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t}\right)^\prime . \tag{16.5} \end{equation}\] Inserting the values of matrix \(\mathbf{D}\) in (16.5), we can then get the variance of the state vector on the observation \(t\) given the uncertainty of the initial state. For example, for ETS(A,N,N), the conditional variance of the level on observation \(t\) is: \[\begin{equation} \mathrm{V}(\hat{l}_{t} | y_1, y_2, \dots y_t) = (1-\alpha)^{t} \mathrm{V}\left( \hat{l}_{0} \right) (1-\alpha)^{t} . \tag{16.6} \end{equation}\] As the formula above shows, if the smoothing parameter lies between zero and one, then the impact of the uncertainty of the initial level on the current one will be diminished with the increase of \(t\). The closer \(\alpha\) is to zero, the more impact the variance of the initial level will have on the variance of the current level. If we use admissible bounds (see Section 4.7), then the smoothing parameter might lie in the region (1, 2), and the impact of the variance of the initial state on the current one will be higher the closer \(\alpha\) is to two.

Now that we have the variance of the state, we can also calculate the variance of the fitted values (or one step ahead in-sample forecast). In the pure additive model, the fitted values are calculated as: \[\begin{equation} \hat{y}_t = \mu_{y,t|t-1} = \mathbf{w}^\prime \hat{\mathbf{v}}_{t-1}. \tag{16.7} \end{equation}\] The variance of the fitted value conditional on all actual observations will then be: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = \mathrm{V}\left( \mathbf{w}^\prime \hat{\mathbf{v}}_{t-1} \right) , \tag{16.8} \end{equation}\] which after inserting (16.5) in (16.8) leads to: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = \mathbf{w}^\prime \mathbf{D}^{t-1} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t-1}\right)^\prime \mathbf{w} . \tag{16.9} \end{equation}\] This variance can then be used to calculate the confidence interval for the fitted values, assuming that the estimates of the initial state follow a Normal distribution (due to CLT). In case of ETS(A,N,N), this equals to: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = (1-\alpha)^{t-1} \mathrm{V}\left( \hat{l}_{0} \right) (1-\alpha)^{t-1} . \tag{16.10} \end{equation}\]

Finally, the variance of the initial states will also impact the conditional \(h\) steps ahead variance of the model. This can be seen from the recursion (5.20), which in the case of non-seasonal models simplifies to: \[\begin{equation} y_{t+h} = \mathbf{w}^\prime \mathbf{F}^{h-1} \hat{\mathbf{v}}_{t} + \mathbf{w}^\prime \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \mathbf{g} e_{t+h-j} + e_{t+h} . \tag{16.11} \end{equation}\] Taking the variance of \(y_{t+h}\) conditional on all the information until the observation \(t\) (all actual values) with \(h>1\) leads to: \[\begin{equation} \begin{aligned} \mathrm{V}( y_{t+h} | y_1, y_2, \dots y_t) = & \mathbf{w}^\prime \mathbf{F}^{h-1} \mathbf{D}^{t-1} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t-1}\right)^\prime (\mathbf{F}^\prime)^{h-1} \mathbf{w} + \\ & \left( \left(\mathbf{w}^\prime \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \mathbf{g} \mathbf{g}^\prime (\mathbf{F}^\prime)^{j-1} \mathbf{w} \right) + 1 \right) \sigma^2 . \end{aligned} \tag{16.12} \end{equation}\] This formula can then be used for the construction of prediction intervals of the model, for example using formula (5.13). The topic of construction of prediction intervals will be discussed later in Section 18.3. In the case of the ETS(A,N,N) model this simplifies to: \[\begin{equation} \begin{aligned} \mathrm{V}( y_{t+h} | y_1, y_2, \dots y_t) = & (1-\alpha)^{t-1} \mathrm{V}\left( \hat{l}_{0} \right) (1-\alpha)^{t-1} + \\ & \left(1 + (h-1) \alpha^2 \right) \sigma^2 . \end{aligned} \tag{16.13} \end{equation}\]

Remark. It is also possible to derive the variances for the seasonal models. The only thing that would change in comparison with the formulae above is that the matrices \(\mathbf{F}\), \(\mathbf{w}\), and \(\mathbf{g}\) will need to be split into sub-matrices, similar to how it was done in Section 5.2.

16.4.2 Estimated parameters of ADAM

Now we discuss the case when the initial states are either known or not estimated directly. This, for example, corresponds to the situation with backcasted initials. Continuing our non-seasonal model example, we can use the following recursion (similar to (16.11)), keeping in mind that now the value of the initial state vector \(\mathbf{v}_0\) is known: \[\begin{equation} \mathbf{v}_{t+h-1} = \hat{\mathbf{F}}^{h-1} \mathbf{v}_{t} + \sum_{j=1}^{h-1} \hat{\mathbf{F}}^{j-1} \hat{\mathbf{g}} e_{t+h-j} . \tag{16.14} \end{equation}\] The conditional variance of the state, given the values on observation \(t\) in (16.14) in general does not have a closed-form because of the exponentiation of the transition matrix \(\hat{\mathbf{F}}\). However, in a special case, when the matrix does not contain the parameters (e.g. non-damped trend ETS models or ARIMA without AR terms), there is an analytical solution to the variance. In this case, \(\mathbf{F}\) is provided rather than being estimated, which simplifies the inference: \[\begin{equation} \mathrm{V}(\mathbf{v}_{t+h-1} | t) = \mathrm{V}\left(\sum_{j=1}^{h-1} \mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) \tag{16.15} \end{equation}\]

The variance of the sum in (16.15) can be expanded as: \[\begin{equation} \begin{aligned} \mathrm{V} \left(\sum_{j=1}^{h-1} \mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j} \right) = & \sum_{j=1}^{h-1} \mathrm{V} \left(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) + \\ & 2 \sum_{j=2}^{h-1} \sum_{i=1}^{j-1} \mathrm{cov}(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j},\mathbf{F}^{i} \hat{\mathbf{g}} e_{t+h-i}). \end{aligned} \tag{16.16} \end{equation}\] Each variance in the left-hand side of (16.16) can be expressed via: \[\begin{equation} \begin{aligned} \mathrm{V} \left(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) = \mathbf{F}^{j-1} \left( \right. & \mathrm{V} (\hat{\mathbf{g}}) \mathrm{V}(e_{t+h-j}) + \mathrm{V} (\hat{\mathbf{g}}) \mathrm{E}(e_{t+h-j})^2 + \\ & \left. \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \mathrm{V}(e_{t+h-j})\right) (\mathbf{F}^{j-1})^\prime. \end{aligned} \tag{16.17} \end{equation}\] Given that the expectation of the error term is assumed to be zero, and substituting \(\mathrm{V}(e_{t+h-j})=\sigma^2\) (assuming that the error term is homoscedastic), this simplifies to: \[\begin{equation} \mathrm{V} \left(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) = \mathbf{F}^{j-1} \left( \mathrm{V} (\hat{\mathbf{g}}) + \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \right) (\mathbf{F}^{j})^\prime \sigma^2. \tag{16.18} \end{equation}\] As for the covariances in (16.16), after the expansion it can be shown that each of them is equal to: \[\begin{equation} \begin{aligned} \mathrm{cov}(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j},\mathbf{F}^{i} \hat{\mathbf{g}} e_{t+h-i}) = & \mathrm{V}(\mathbf{F}^{j-1} \hat{\mathbf{g}}) \mathrm{cov}(e_{t+h-i},e_{t+h-j}) \\ & + \left(\mathbf{F}^{j-1} \hat{\mathbf{g}}\right)^2 \mathrm{cov}(e_{t+h-i},e_{t+h-j}) \\ & + \mathrm{E}(e_{t+h-i}) \mathrm{E}(e_{t+h-j}) \mathrm{V}(\mathbf{F}^{j-1} \hat{\mathbf{g}}). \end{aligned} \tag{16.19} \end{equation}\] Given the assumptions of the model, the autocovariances of error terms should all be equal to zero, and the expectation of the error term should be equal to zero as well, which means that all the value in (16.19) will be equal to zero as well. Based on this, the conditional variance of states equals to: \[\begin{equation} \mathrm{V}(\mathbf{v}_{t+h-1}|t) = \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \left( \mathrm{V} (\hat{\mathbf{g}}) + \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \right) (\mathbf{F}^{j})^\prime \sigma^2 \tag{16.20} \end{equation}\] As discussed in Section 5.3, the conditional variance of the actual value \(h\) steps ahead is: \[\begin{equation} \mathrm{V}(y_{t+h}|t) = \mathbf{w}^\prime \mathrm{V}(\mathbf{v}_{t+h-1}|t) \mathbf{w} + \sigma^2 \tag{16.21} \end{equation}\] Inserting (16.20) in (16.21), we get the final conditional \(h\) steps ahead variance of the model: \[\begin{equation} \sigma^2_h = \mathrm{V}(y_{t+h}|t) = \left(\mathbf{w}^\prime \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \left( \mathrm{V} (\hat{\mathbf{g}}) + \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \right) (\mathbf{F}^{j})^\prime \mathbf{w} + 1 \right)\sigma^2, \tag{16.22} \end{equation}\] which looks similar to the formula (5.15) from Section 5.3, but now has the covariance of persistence vector in it. For a special case of ETS(A,N,N) this simplifies to: \[\begin{equation} \sigma^2_h = \mathrm{V}(y_{t+h}|t) = \left((h-1) \left(\mathrm{V}(\hat{\alpha}) + \hat{\alpha}^2 \right) + 1 \right) \sigma^2, \tag{16.23} \end{equation}\] which as can be seen differs from the conventional variance by the value of the variance of the smoothing parameter \(\mathrm{V}(\hat{\alpha})\). Similarly, the conditional variances for ETS(A,A,N), ETS(A,N,A), and ETS(A,A,A) can be produced using the formula (16.22).

Unfortunately, the conditional variances for the other models are more complicated due to the introduction of convolutions of parameters. Furthermore, the formula (16.22) only focuses on the conditional variance given the known \(\mathbf{v}_t\) but does not take into account the uncertainty of it for the fitted values in-sample. Given the complexity of the problem, in the next section, we introduce a technique that allows correctly propagating the uncertainty of parameters and initial values to the forecasts of any ADAM.