16.3 Conditional variance with uncertain parameters
We now consider two special cases with pure additive state space models: (1) When the values of the initial state vector are unknown; (2) When the parameters of the model (e.g. smoothing or AR/MA parameters) are estimated on a sample of data - and discuss analytical formulae for the conditional variance for the two cases. This variance can then be used for the construction of confidence interval of the fitted line and / or for the confidence / prediction interval for the holdout period. We do not cover the more realistic case, when both initials and parameters are estimated, because there is no closed analytical form for this due to potential correlations between the estimates of parameters.
16.3.1 Estimated initial state
First, we need to recall the recursive relations discussed in Section 5.4, specifically formula (5.9). Just to simplify all the derivations in this section, we consider the non-seasonal case, for which all elements of \(\mathbf{l}\) are equal to one. This can be ETS(A,N,N), ETS(A,A,N), ETS(A,Ad,N) or ARIMA models. The recursive relation from the observation first observation till the end of sample can be written as: \[\begin{equation} \hat{\mathbf{v}}_{t} = \mathbf{D}^{t} \hat{\mathbf{v}}_{0} + \sum_{j=0}^{t-1} \mathbf{D}^{j} y_{t - j} , \tag{16.3} \end{equation}\] where \(\mathbf{D}=\mathbf{F} - \mathbf{g}\mathbf{w}'\). The formula (16.3) shows that the most recent value of the state vector depends on the initial value \(\hat{\mathbf{v}}_{0}\) and on the linear combination of actual values. Note that we assume in this part that the matrix \(\mathbf{D}\) is known, i.e. the smoothing parameters are not estimated. Although this is an unrealistic assumption, it helps in showing how the variance of initial state would influence the variance at the end of sample. If we now take the variance conditional on the actual values \(y_{t - j}\) for all \(j=\{0, \dots, t-1 \}\), then we will have (due to independence of two terms in (16.3)): \[\begin{equation} \mathrm{V}(\hat{\mathbf{v}}_{t} | y_1, y_2, \dots y_t) = \mathrm{V}\left( \mathbf{D}^{t} \hat{\mathbf{v}}_{0} \right) + \mathrm{V}\left(\sum_{j=0}^{t-1} \mathbf{D}^{j} y_{t - j} | y_1, y_2, \dots y_t \right) . \tag{16.4} \end{equation}\] The reason why we condition the variance on actual values, is because they are given to us and we want to see how different initial states would lead to the changes in the model fit given these values and thus how the uncertainty will propagate from \(j=1\) to \(j=t\). In the formula (16.4), the right-hand side is equal to zero, because all actual values are known and \(\mathbf{D}\) does not have any uncertainty due to the assumption above. This leads to the following covariance matrix of states on observation \(t\): \[\begin{equation} \mathrm{V}(\hat{\mathbf{v}}_{t} | y_1, y_2, \dots y_t) = \mathbf{D}^{t} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t}\right)' . \tag{16.5} \end{equation}\] Inserting the values of matrix \(\mathbf{D}\), we can then get the variance (16.5). For example, for ETS(A,N,N), the conditional variance of the state on observation \(t\) is: \[\begin{equation} \mathrm{V}(\hat{l}_{t} | y_1, y_2, \dots y_t) = (1-\alpha)^{t} \mathrm{V}\left( \hat{l}_{0} \right) (1-\alpha)^{t} . \tag{16.6} \end{equation}\] As the formula above shows, if the smoothing parameter lies between zero and one, then the uncertainty of the initial level will not have a big impact on the uncertainty on observation \(t\). If we use admissible bounds (see Section 4.6), then the smoothing parameters might lie in the region (1, 2) and thus the impact of the variance of the initial state will increase with the increase of the sample size \(t\).
Now that we have the variance of the state, we can also calculate the variance of the fitted values (or one step ahead in-sample forecast). In the pure additive model, the fitted values are calculated as: \[\begin{equation} \hat{y}_t = \mu_{t|t-1} = \mathbf{w}' \hat{\mathbf{v}}_{t-1}. \tag{16.7} \end{equation}\] The variance conditional on all actual observations will then be: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = \mathrm{V}\left( \mathbf{w}' \hat{\mathbf{v}}_{t-1} \right) , \tag{16.8} \end{equation}\] which after inserting (16.5) in (16.8) leads to: \[\begin{equation} \mathrm{V}(\hat{y}_t | y_1, y_2, \dots y_t) = \mathbf{w}' \mathbf{D}^{t-1} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t-1}\right)' \mathbf{w} . \tag{16.9} \end{equation}\] This variance can then be used in the calculation of the confidence interval for the fitted values, assuming that the estimates of initials state follow normal distribution (CLT).
Finally, the variance of initial states will also impact the conditional h steps ahead variance from the model. This can be seen from the recursion (5.19), which in case of non-seasonal models simplifies to: \[\begin{equation} y_{t+h} = \mathbf{w}' \mathbf{F}^{h-1} \hat{\mathbf{v}}_{t} + \mathbf{w}' \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \mathbf{g} \epsilon_{t+h-j} + \epsilon_{t+h} . \tag{16.10} \end{equation}\] Taking the variance of \(y_{t+h}\) conditional on the all the information until the observation \(t\) (all actual values) leads to: \[\begin{equation} \begin{aligned} \mathrm{V}( y_{t+h} | y_1, y_2, \dots y_t) = & \mathbf{w}' \mathbf{F}^{h-1} \mathbf{D}^{t-1} \mathrm{V}\left( \hat{\mathbf{v}}_{0} \right) \left(\mathbf{D}^{t-1}\right)' (\mathbf{F}')^{h-1} \mathbf{w} + \\ & \left( \left(\mathbf{w}' \sum_{j=1}^{h-1} \mathbf{F}^{j-1} \mathbf{g} \mathbf{g}' (\mathbf{F}')^{j-1} \mathbf{w} \right) + 1 \right) \sigma^2 . \end{aligned} \tag{16.11} \end{equation}\] This formula can then be used for the construction of prediction intervals of the model, for example using formula (5.12). Note that the construction of prediction intervals will be discussed later in Section ??.
As a final note, it is also possible to derive the variances for the seasonal models, the only thing that would change in this situation is that the matrices \(\mathbf{F}\), \(\mathbf{w}\) and \(\mathbf{g}\) will be split into submatrices, similar to how it was done in Section 5.2.
16.3.2 Estimated parameters of ADAM model
Now we discuss the case, when the initial states are either known or not estimated directly. This, for example, corresponds to the situation with backcasted initials. Continuing our non-seasonal model example, we can use the same recursion (16.3), keeping in mind that now the value of the initial state vector \(\mathbf{v}_0\) is known. This results in the following variance: \[\begin{equation} \mathrm{V}(\mathbf{v}_{t} | y_1, y_2, \dots y_t) = \mathrm{V}\left( \hat{\mathbf{D}}^{t} \mathbf{v}_{0} + \sum_{j=0}^{t-1} \hat{\mathbf{D}}^{j} y_{t - j} | y_1, y_2, \dots y_t \right) . \tag{16.12} \end{equation}\] Unfortunately, there is no closed form for the formula (16.12) for a general state space model, because the uncertainty comes from \(\hat{\mathbf{D}}\), which is then exponentiated. The variance of \(\hat{\mathbf{D}}^{j}\) does not have analytical solution in a general case. Furthermore, we cannot assume that \(\hat{\mathbf{D}}^{j}\) is independent of \(\hat{\mathbf{D}}^{i}\) for any \(i\) and \(j\). However, what we can calculate is the conditional h steps ahead variance of \(y_{t+h}\), given values of \(\mathbf{v}_t\) for a special case, ETS(A,N,N) model. This is based on the recursion (5.19): \[\begin{equation} \mathrm{V}(y_{t+h} | l_t) = \sigma^2_h = \mathrm{V} \left(\sum_{j=1}^{h-1} \hat{\alpha} \epsilon_{t+h-j} \right) + \sigma^2. \tag{16.13} \end{equation}\] The variance of the sum in (16.13) can be expanded as: \[\begin{equation} \mathrm{V} \left(\sum_{j=1}^{h-1} \hat{\alpha} \epsilon_{t+h-j} \right) = \sum_{j=1}^h \mathrm{V} \left(\hat{\alpha} \epsilon_{t+h-j}\right) + 2 \sum_{j=2}^{h-1} \sum_{i=1}^{j-1} \mathrm{cov}(\hat{\alpha} \epsilon_{t+h-j},\hat{\alpha} \epsilon_{t+h-i}). \tag{16.14} \end{equation}\] Each variance in left-hand side of (16.14) can be expressed via: \[\begin{equation} \mathrm{V} \left(\hat{\alpha} \epsilon_{t+h-j}\right) = \mathrm{V} (\hat{\alpha}) \mathrm{V}(\epsilon_{t+h-j}) + \mathrm{V} (\hat{\alpha}) \mathrm{E}(\epsilon_{t+h-j})^2 + \mathrm{E} (\hat{\alpha})^2 \mathrm{V}(\epsilon_{t+h-j}). \tag{16.15} \end{equation}\] Given that the expectation of error term is assumed to be zero, this simplifies to: \[\begin{equation} \mathrm{V} \left(\hat{\alpha} \epsilon_{t+h-j}\right) = \left(\mathrm{V} (\hat{\alpha}) + \hat{\alpha}^2 \right) \sigma^2. \tag{16.16} \end{equation}\] As for the covariances in (16.14), after the expansion it can be shown that each of them is equal to: \[\begin{equation} \begin{aligned} \sum_{i=1}^{j-1} \mathrm{cov}(\hat{\alpha} \epsilon_{t+j},\hat{\alpha} \epsilon_{t+i}) = & \mathrm{V}(\hat{\alpha}) \mathrm{cov}(\epsilon_{t+h-i},\epsilon_{t+h-j}) + \hat{\alpha}^2 \mathrm{cov}(\epsilon_{t+h-i},\epsilon_{t+h-j}) \\ & + \mathrm{E}(\epsilon_{t+h-i}) \mathrm{E}(\epsilon_{t+h-j}) \mathrm{V}(\hat{\alpha}). \end{aligned} \tag{16.17} \end{equation}\] Given the assumptions of the model, the autocovariances of error terms should all be equal to zero, and the expectation of the error term should be equal to zero as well, which means that the value in (16.17) will be equal to zero as well. Based on that, each variance in (16.14) can be represented as: \[\begin{equation} \mathrm{V} \left(\sum_{j=1}^{h-1} \hat{\alpha} \epsilon_{t+h-j} \right) = (h-1) \left(\mathrm{V} (\hat{\alpha}) + \hat{\alpha}^2 \right) \sigma^2 . \tag{16.18} \end{equation}\] Inserting (16.18) in (16.13), we get the final conditional h steps ahead variance of the model: \[\begin{equation} \sigma^2_h = \left((h-1) \left(\mathrm{V} (\hat{\alpha}) + \hat{\alpha}^2 \right) + 1\right) \sigma^2, \tag{16.19} \end{equation}\] which looks similar to the one in formula (5.14) from Section 5.3, but now has the variance of the smoothing parameters in it.
Unfortunately, the conditional variances for the other models are more complicated due to the introduction of convolutions of parameters. Furthermore, the formula (16.19) only focuses on the conditional variance given the known \(l_t\), but does not take into account the uncertainty of \(l_t\) for the fitted values in sample. Given the complexity of the problem, in the next section, we introduce a technique that allows correctly propagating the uncertainty of parameters and initial values to the forecasts of the model.