$$\newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}}$$

## 5.4 Stability and forecastability conditions

Another important aspect of the pure additive model (5.5) is the restriction on the smoothing parameters. This is related to the stability and forecastability conditions of the model, defined by Hyndman et al. (2008) in Chapter 10. The stability implies that the weights for observations in a dynamic model decay over time (see example with SES in Section 3.4.2). This guarantees that the newer observations will have higher weights than the older ones, thus the impact of the older information on forecasts slowly disappears with the increase of the sample size. The forecastability does not guarantee that the weights decay, but it guarantees that the initial value of the state vector will have a constant impact on forecasts, i.e. it will not increase in weight with the increase of the sample size. An example of the non-stable, but forecastable model is ETS(A,N,N) with $$\alpha=0$$. In this case, it reverts to the global level model (Section 3.3.2), where the initial value impacts the final forecast in the same way as it does for the first observation.

In order to derive both conditions for the ADAM, we need to use a reduced form of the model by inserting the measurement equation in the transition equation via $$\epsilon_t= {y}_{t} -\mathbf{w}^\prime \mathbf{v}_{t-\boldsymbol{l}}$$: \begin{aligned} \mathbf{v}_{t} = &\mathbf{F} \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{g} \left({y}_{t} -\mathbf{w}^\prime \mathbf{v}_{t-\boldsymbol{l}} \right)\\ = & \left(\mathbf{F} -\mathbf{g}\mathbf{w}^\prime \right) \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{g} {y}_{t} \\ = & \mathbf{D} \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{g} {y}_{t} \end{aligned}. \tag{5.16} The matrix $$\mathbf{D}=\mathbf{F} -\mathbf{g}\mathbf{w}^\prime$$ is called the discount matrix and it shows how the weights diminish over time. It is the main part of the model that determines, whether the model will be stable/forecastable or not.

### 5.4.1 Example with ETS(A,N,N)

In order to better understand what we plan to discuss in this section, consider an example of an ETS(A,N,N) model, for which $$\mathbf{F}=1$$, $$\mathbf{w}=1$$, $$\mathbf{g}=\alpha$$, $$\mathbf{v}_t=l_t$$ and $$\boldsymbol{l}=1$$. Inserting these values in (5.16), we get: \begin{aligned} l_{t} = & \left(1 -\alpha \right) {l}_{t-1} + \alpha {y}_{t}, \end{aligned}. \tag{5.17} which corresponds to the formula of SES from Section 3.4. The discount matrix, in this case, is $$\mathbf{D}=1-\alpha$$. If we now substitute the values for the level on the right-hand side of the equation (5.17) by the previous values of the level, we will obtain the recursion that we have already discussed in Section 3.4.2, but now in terms of the “true” components and parameters: \begin{aligned} l_{t} = & {\alpha} \sum_{j=0}^{t-1} (1 -{\alpha})^j {y}_{t-j} + (1 -{\alpha})^t l_0 \end{aligned}. \tag{5.18} The stability condition for ETS(A,N,N) is that the discount scalar $$1-\alpha$$ is less than one by absolute value. This way, the weights will decay in time because of the exponentiation in (5.18) to the power of $$j$$. This condition is satisfied when $$\alpha \in(0, 2)$$, which is the admissible bound discussed in Section 4.7.

As for the forecastability condition, in this case it implies that $$\lim\limits_{t\rightarrow\infty}(1 -{\alpha})^t l_0 = \text{const}$$, which means that the effect of the initial state on future values stays the same. This is achievable, for example, when $$\alpha=0$$, but is violated, when $$\alpha<0$$ or $$\alpha\geq 2$$. So, the bounds for the smoothing parameters in the ETS(A,N,N) model, guaranteeing the forecastability of the model (i.e. making it useful), are: $$$\alpha \in [0, 2) . \tag{5.19}$$$

### 5.4.2 Coming back to the general case

In the general case, the logic is the same as with ETS(A,N,N), but it implies the usage of linear algebra. Due to our lagged formulation, the recursion becomes complicated, because the discount matrix $$\mathbf{D}$$ needs to be split into submatrices similar to how we did it in Section 5.2: \begin{aligned} \mathbf{v}_{t} = & \mathbf{D}_{m_1}^{\lceil\frac{t}{m_1}\rceil} \mathbf{v}_{0} + \sum_{j=0}^{\lceil\frac{t}{m_1}\rceil-1} \mathbf{D}_{m_1}^{j} y_{t -j m_1} + \\ & \mathbf{D}_{m_2}^{\lceil\frac{t}{m_2}\rceil} \mathbf{v}_{0} + \sum_{j=0}^{\lceil\frac{t}{m_2}\rceil-1} \mathbf{D}_{m_2}^j y_{t -j m_2} + \\ & \dots + \\ & \mathbf{D}_{m_d}^{\lceil\frac{t}{m_d}\rceil} \mathbf{v}_{0} + \sum_{j=0}^{\lceil\frac{t}{m_d}\rceil-1} \mathbf{D}_{m_d}^j y_{t -j m_d} \end{aligned}, \tag{5.20} where $$\mathbf{D}_{m_i} = \mathbf{F}_{m_i} -\mathbf{g}_{m_i} \mathbf{w}_{m_i}^\prime$$ is the discount matrix for each lag of the model. The stability condition in this case is that the absolute values of all the non-zero eigenvalues of the discount matrices $$\mathbf{D}_{m_i}$$ are lower than one. This condition can be checked at the model construction stage, ensuring that the selected parameters guarantee the stability of the model. As for the forecastability, as discussed earlier, it will hold if the initial value of the state vector does not have an increasing impact on the last observed value. This is obtained by inserting (5.20) in the measurement equation of the pure additive model: \begin{aligned} y_t = & \mathbf{w}_{m_1}^\prime \mathbf{D}_{m_1}^{\lceil\frac{t-1}{m_1}\rceil} \mathbf{v}_{0} + \mathbf{w}_{m_1}^\prime \sum_{j=0}^{\lceil\frac{t-1}{m_1}\rceil-1} \mathbf{D}_{m_1}^{j} y_{t-1 -j m_1} + \\ & \mathbf{w}_{m_2}^\prime \mathbf{D}_{m_2}^{\lceil\frac{t-1}{m_2}\rceil} \mathbf{v}_{0} + \mathbf{w}_{m_2}^\prime \sum_{j=0}^{\lceil\frac{t-1}{m_2}\rceil-1} \mathbf{D}_{m_2}^j y_{t-1 -j m_2} + \\ & \dots + \\ & \mathbf{w}_{m_d}^\prime \mathbf{D}_{m_d}^{\lceil\frac{t-1}{m_d}\rceil} \mathbf{v}_{0} + \mathbf{w}_{m_d}^\prime \sum_{j=0}^{\lceil\frac{t-1}{m_d}\rceil-1} \mathbf{D}_{m_d}^j y_{t-1 -j m_d} + \epsilon_t \end{aligned}, \tag{5.21} In our case the forecastability condition implies that: $$$\lim\limits_{t\rightarrow\infty}\left(\mathbf{w}_{m_i}^\prime\mathbf{D}_{m_i}^{\lceil\frac{t-1}{m_i}\rceil} \mathbf{v}_{0}\right) = \text{const for all } i=1, \dots, d. \tag{5.22}$$$ These conditions are general but applicable to any model formulated in the pure additive form (5.5).

### References

• Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D., 2008. Forecasting with Exponential Smoothing: The State Space Approach. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-71918-2