16.3 Conditional variance with uncertain parameters

We now consider two special cases with pure additive state space models:

  1. When the values of the initial state vector are unknown;
  2. When the model parameters (e.g. smoothing or AR/MA parameters) are estimated on a sample of data.

We discuss analytical formulae for the conditional variances for these cases. This variance can then be used to construct the confidence interval of the fitted line and/or for the confidence/prediction interval for the holdout period. We do not cover the more realistic case when both initials and parameters are estimated because there is no closed analytical form for this due to potential correlations between the estimates of parameters.

16.3.1 Estimated initial state

First, we need to recall the recursive relations discussed in Section 5.4, specifically formula (5.10). Just to simplify all the derivations in this section, we consider the non-seasonal case, in which all elements of \(\boldsymbol{l}\) are equal to one. This can be ETS(A,N,N), ETS(A,A,N), ETS(A,Ad,N) or several ARIMA models. The more general case is more complicated but is derivable using the same principles as discussed below. The recursive relation from the first observation till the end of the sample can be written as: \[\begin{equation} \hat{\mathbf{v}}_{t} = \mathbf{D}^{t} \hat{\mathbf{v}}_{0} + \sum_{j=0}^{t-1} \mathbf{D}^{j} \hat{\mathbf{g}} y_{t-j} , \tag{16.3} \end{equation}\] where \(\mathbf{D}=\mathbf{F} -\mathbf{g}\mathbf{w}^\prime\). The formula (16.3) shows that the most recent value of the state vector depends on the initial value \(\hat{\mathbf{v}}_{0}\) and on the linear combination of actual values. Note that we assume in this part that the matrix \(\mathbf{D}\) is known, i.e. the smoothing parameters are not estimated. Although this is an unrealistic assumption, it helps in showing how the variance of initial state would influence the conditional variance of actual values at the end of sample. If we now take the variance of state vector conditional on the previous actual values \(y_{t-j}\) for all \(j=\{0, \dots, t-1 \}\), then we will have (due to independence of two terms in (16.3)): \[\begin{equation} \mathrm{V}(\hat{\mathbf{v}}_{t} | y_1, y_2, \dots y_t) = \mathrm{V}\left( \mathbf{D}^{t} \hat{\mathbf{g}} \hat{\mathbf{v}}_{0} \right) + \mathrm{V}\left(\sum_{j=0}^{t-1} \mathbf{D}^{j} y_{t-j} | y_1, y_2, \dots y_t \right) . \tag{16.4} \end{equation}\] We condition the variance on actual values because they are given to us, and we want to see how different initial states would lead to the changes in the model fit given these values and thus how the uncertainty will propagate from \(j=1\) to \(j=t\). In the formula (16.4), the right-hand side is equal to zero because all actual values are known, and \(\mathbf{D}\) does not have any uncertainty due to the assumption above. This leads to the following covariance matrix of states on observation \(t\): \[\begin{equation} \mathrm{V}(\hat{\mathbf{v}}_{t} | y_1, y_2, \dots y_t) = \mathbf{D}^{t} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t}\right)^\prime . \tag{16.5} \end{equation}\] Inserting the values of matrix \(\mathbf{D}\) in (16.5), we can then get the variance of state vector. For example, for ETS(A,N,N), the conditional variance of the level on observation \(t\) is: \[\begin{equation} \mathrm{V}(\hat{l}_{t} | y_1, y_2, \dots y_t) = (1-\alpha)^{t} \mathrm{V}\left( \hat{l}_{0} \right) (1-\alpha)^{t} . \tag{16.6} \end{equation}\] As the formula above shows, if the smoothing parameter lies between zero and one, then the uncertainty of the initial level will not have a big impact on the uncertainty on observation \(t\). The closer \(\alpha\) is to zero, the more impact the variance of the initial level will have on the variance of the final level. If we use admissible bounds (see Section 4.7), then the smoothing parameter might lie in the region (1, 2), and thus the impact of the variance of the initial state will increase with the increase of the sample size \(t\).

Now that we have the variance of the state, we can also calculate the variance of the fitted values (or one step ahead in-sample forecast). In the pure additive model, the fitted values are calculated as: \[\begin{equation} \hat{y}_t = \mu_{y,t|t-1} = \mathbf{w}^\prime \hat{\mathbf{v}}_{t-1}. \tag{16.7} \end{equation}\] The variance conditional on all actual observations will then be: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = \mathrm{V}\left( \mathbf{w}^\prime \hat{\mathbf{v}}_{t-1} \right) , \tag{16.8} \end{equation}\] which after inserting (16.5) in (16.8) leads to: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = \mathbf{w}^\prime \mathbf{D}^{t-1} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t-1}\right)^\prime \mathbf{w} . \tag{16.9} \end{equation}\] This variance can then be used to calculate the confidence interval for the fitted values, assuming that the estimates of initials state follow a Normal distribution (due to CLT).

Finally, the variance of initial states will also impact the conditional h steps ahead variance of the model. This can be seen from the recursion (5.20), which in case of non-seasonal models simplifies to: \[\begin{equation} y_{t+h} = \mathbf{w}^\prime \mathbf{F}^{h-1} \hat{\mathbf{v}}_{t} + \mathbf{w}^\prime \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \mathbf{g} e_{t+h-j} + e_{t+h} . \tag{16.10} \end{equation}\] Taking the variance of \(y_{t+h}\) conditional on the all the information until the observation \(t\) (all actual values) leads to: \[\begin{equation} \begin{aligned} \mathrm{V}( y_{t+h} | y_1, y_2, \dots y_t) = & \mathbf{w}^\prime \mathbf{F}^{h-1} \mathbf{D}^{t-1} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t-1}\right)^\prime (\mathbf{F}^\prime)^{h-1} \mathbf{w} + \\ & \left( \left(\mathbf{w}^\prime \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \mathbf{g} \mathbf{g}^\prime (\mathbf{F}^\prime)^{j-1} \mathbf{w} \right) + 1 \right) \sigma^2 . \end{aligned} \tag{16.11} \end{equation}\] This formula can then be used for the construction of prediction intervals of the model, for example using formula (5.13). The topic of construction of prediction intervals will be discussed later in Section 18.3.

As a final note, it is also possible to derive the variances for the seasonal models. The only thing that would change in this situation is that the matrices \(\mathbf{F}\), \(\mathbf{w}\) and \(\mathbf{g}\) will need to be split into submatrices, similar to how it was done in Section 5.2.

16.3.2 Estimated parameters of ADAM

Now we discuss the case when the initial states are either known or not estimated directly. This, for example, corresponds to the situation with backcasted initials. Continuing our non-seasonal model example, we can use the following recursion (similar to (16.10)), keeping in mind that now the value of the initial state vector \(\mathbf{v}_0\) is known: \[\begin{equation} \mathbf{v}_{t+h-1} = \hat{\mathbf{F}}^{h-1} \mathbf{v}_{t} + \sum_{j=1}^{h-1} \hat{\mathbf{F}}^{j-1} \hat{\mathbf{g}} e_{t+h-j} . \tag{16.12} \end{equation}\] The conditional variance of the state, given the values on observation \(t\) in (16.12) in general does not have a closed-form because of the exponentiation of the transition matrix \(\hat{\mathbf{F}}\). However, in a special case, when the matrix does not contain the parameters (e.g. non-damped trend ETS models or ARIMA without AR terms), there is an analytical solution to the variance. In this case, \(\mathbf{F}\) is provided rather than being estimated, which simplifies the inference: \[\begin{equation} \mathrm{V}(\mathbf{v}_{t+h-1} | t) = \mathrm{V}\left(\sum_{j=1}^{h-1} \mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) \tag{16.13} \end{equation}\]

The variance of the sum in (16.13) can be expanded as: \[\begin{equation} \begin{aligned} \mathrm{V} \left(\sum_{j=1}^{h-1} \mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j} \right) = & \sum_{j=1}^{h-1} \mathrm{V} \left(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) + \\ & 2 \sum_{j=2}^{h-1} \sum_{i=1}^{j-1} \mathrm{cov}(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j},\mathbf{F}^{i} \hat{\mathbf{g}} e_{t+h-i}). \end{aligned} \tag{16.14} \end{equation}\] Each variance in left-hand side of (16.14) can be expressed via: \[\begin{equation} \begin{aligned} \mathrm{V} \left(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) = \mathbf{F}^{j-1} \left( \right. & \mathrm{V} (\hat{\mathbf{g}}) \mathrm{V}(e_{t+h-j}) + \mathrm{V} (\hat{\mathbf{g}}) \mathrm{E}(e_{t+h-j})^2 + \\ & \left. \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \mathrm{V}(e_{t+h-j})\right) (\mathbf{F}^{j-1})^\prime. \end{aligned} \tag{16.15} \end{equation}\] Given that the expectation of error term is assumed to be zero, and substituting \(\mathrm{V}(e_{t+h-j})=\sigma^2\), this simplifies to: \[\begin{equation} \mathrm{V} \left(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j}\right) = \mathbf{F}^{j-1} \left( \mathrm{V} (\hat{\mathbf{g}}) + \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \right) (\mathbf{F}^{j})^\prime \sigma^2. \tag{16.16} \end{equation}\] As for the covariances in (16.14), after the expansion it can be shown that each of them is equal to: \[\begin{equation} \begin{aligned} \mathrm{cov}(\mathbf{F}^{j-1} \hat{\mathbf{g}} e_{t+h-j},\mathbf{F}^{i} \hat{\mathbf{g}} e_{t+h-i}) = & \mathrm{V}(\mathbf{F}^{j-1} \hat{\mathbf{g}}) \mathrm{cov}(e_{t+h-i},e_{t+h-j}) \\ & + \left(\mathbf{F}^{j-1} \hat{\mathbf{g}}\right)^2 \mathrm{cov}(e_{t+h-i},e_{t+h-j}) \\ & + \mathrm{E}(e_{t+h-i}) \mathrm{E}(e_{t+h-j}) \mathrm{V}(\mathbf{F}^{j-1} \hat{\mathbf{g}}). \end{aligned} \tag{16.17} \end{equation}\] Given the assumptions of the model, the autocovariances of error terms should all be equal to zero, and the expectation of the error term should be equal to zero as well, which means that all the value in (16.17) will be equal to zero as well. Based on this, the conditional variance of states is equal to: \[\begin{equation} \mathrm{V}(\mathbf{v}_{t+h-1}|t) = \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \left( \mathrm{V} (\hat{\mathbf{g}}) + \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \right) (\mathbf{F}^{j})^\prime \sigma^2 \tag{16.18} \end{equation}\] As discussed in Section 5.3, the conditional variance of the actual value \(h\) steps ahead is: \[\begin{equation} \mathrm{V}(y_{t+h}|t) = \mathbf{w}^\prime \mathrm{V}(\mathbf{v}_{t+h-1}|t) \mathbf{w} + \sigma^2 \tag{16.19} \end{equation}\] Inserting (16.18) in (16.19), we get the final conditional h steps ahead variance of the model: \[\begin{equation} \sigma^2_h = \mathrm{V}(y_{t+h}|t) = \left(\mathbf{w}^\prime \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \left( \mathrm{V} (\hat{\mathbf{g}}) + \mathrm{E} (\hat{\mathbf{g}}) \mathrm{E} (\hat{\mathbf{g}})^\prime \right) (\mathbf{F}^{j})^\prime \mathbf{w} + 1 \right)\sigma^2, \tag{16.20} \end{equation}\] which looks similar to the one in formula (5.15) from Section 5.3, but now has the covariance of persistence vector in it.

Unfortunately, the conditional variances for the other models are more complicated due to the introduction of convolutions of parameters. Furthermore, the formula (16.20) only focuses on the conditional variance given the known \(\mathbf{v}_t\) but does not take into account the uncertainty of it for the fitted values in-sample. Given the complexity of the problem, in the next section, we introduce a technique that allows correctly propagating the uncertainty of parameters and initial values to the forecasts of any ADAM.