## 5.1 Model formulation

The pure additive case is interesting, because this is the group of models that have closed forms for conditional moments (mean and variance) and support parametric predictive distribution for several steps ahead values. In order to understand how we can get to the general model form, we consider an example of ETS(A,A,A) model, which, as discussed in Section 4.2, is formulated as: \begin{equation} \begin{aligned} & y_t = l_{t-1} + b_{t-1} + s_{t-m} + \epsilon_{t} \\ & l_{t} = l_{t-1} + b_{t-1} + \alpha \epsilon_{t} \\ & b_{t} = b_{t-1} + \beta \epsilon_{t} \\ & s_{t} = s_{t-m} + \gamma \epsilon_{t} \end{aligned}. \tag{5.1} \end{equation} This model can be formatted in the following way: \begin{equation} \begin{aligned} y_{t} = & l_{t-1} & + & b_{t-1} & + & s_{t-m} & + & \epsilon_t \\ l_t = & l_{t-1} & + & b_{t-1} & + & 0 & + & \alpha \epsilon_t \\ b_t = & 0 & + & b_{t-1} & + & 0 & + & \beta \epsilon_t \\ s_t = & 0 & + & 0 & + & s_{t-m} & + & \gamma \epsilon_t \end{aligned} \tag{5.2} \end{equation} to see how its elements can then be represented in the matrix form based on (5.2): \begin{equation} \begin{aligned} y_t & = \begin{pmatrix} 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} l_{t-1} \\ b_{t-1} \\ s_{t-m} \end{pmatrix} + \epsilon_t \\ \begin{pmatrix} l_t \\ b_t \\ s_t \end{pmatrix} & = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} l_{t-1} \\ b_{t-1} \\ s_{t-m} \end{pmatrix} + \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix} \epsilon_t \end{aligned}. \tag{5.3} \end{equation} I use tabulation in equation (5.2) to show how the matrix form is related to the general one. The positions of $$l_{t-1}$$, $$b_{t-1}$$ and $$s_{t-m}$$ correspond to the non-zero values in the transition matrix in (5.3). Now we can define each matrix and vector, for example: \begin{equation} \begin{aligned} \mathbf{w} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, & \mathbf{F} = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, & \mathbf{g} = \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix}, \\ \mathbf{v}_{t} = \begin{pmatrix} l_t \\ b_t \\ s_t \end{pmatrix}, & \mathbf{v}_{t-\boldsymbol{l}} = \begin{pmatrix} l_{t-1} \\ b_{t-1} \\ s_{t-m} \end{pmatrix}, & \boldsymbol{l} = \begin{pmatrix} 1 \\ 1 \\ m \end{pmatrix} \end{aligned}. \tag{5.4} \end{equation} Substituting (5.4) into (5.3), we get the general pure additive ADAM ETS model: \begin{equation} \begin{aligned} {y}_{t} = &\mathbf{w}^\prime \mathbf{v}_{t-\boldsymbol{l}} + \epsilon_t \\ \mathbf{v}_{t} = &\mathbf{F} \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{g} \epsilon_t \end{aligned}, \tag{5.5} \end{equation} where $$\mathbf{w}$$ is the measurement vector, $$\mathbf{F}$$ is the transition matrix, $$\mathbf{g}$$ is the persistence vector, $$\mathbf{v}_{t-\boldsymbol{l}}$$ is the vector of lagged components and $$\boldsymbol{l}$$ is the vector of lags. The important thing to note is that the ADAM is based on the model discussed in Section 4.6.1, but it is formulated using lags of components rather than their transition over time. This comes to the elements of the vector $$\boldsymbol{l}$$. Just for the comparison, the conventional ETS(A,A,A), formulated according to (4.20) would have the following transition matrix (instead of (5.4)): $\begin{equation} \mathbf{F} = \begin{pmatrix} 1 & 1 & \mathbf{0}^\prime_{m-1} & 0 \\ 0 & 1 & \mathbf{0}^\prime_{m-1} & 0 \\ 0 & 0 & \mathbf{0}^\prime_{m-1} & 1 \\ \mathbf{0}_{m-1} & \mathbf{0}_{m-1} & \mathbf{I}_{m-1} & \mathbf{0}_{m-1} \end{pmatrix}, \tag{5.6} \end{equation}$ where $$\mathbf{I}_{m-1}$$ is the identity matrix of the size $$(m-1) \times (m-1)$$ and $$\mathbf{0}_{m-1}$$ is the vector of zeroes of size $$m-1$$. The main benefit of using the vector of lags $$\boldsymbol{l}$$ instead of the conventional mechanism in the transition equation is in the reduction of dimensions of matrices (the transition matrix contains $$3\times 3$$ elements in case of (5.5) instead of $$(2+m)\times (2+m)$$ as for the conventional ETS model). The model (5.5) is more parsimonious than the conventional one and simplifies some of the calculations, making it realistic, for example, to apply models to data with large frequency $$m$$ (e.g. 24, 48, 52, 365). The main disadvantage of this approach is in the complications arising in the derivation of conditional expectation and variance, which still have closed forms, but are more cumbersome. They are discussed later in this chapter in Section 5.3.