This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

10.3 Dynamic X in ADAMX

Note: the model discussed in this section assumes a very specific dynamics of parameters, aligning with what the conventional ETS assumes: parameters are correlated with the states of the model. It does not treat parameters as independent as, for example, MSOE state space models do, which makes this model restrictive in its application. But this type of model works well with categorical variables as I show later in this section.

As discussed in Section 10.1, the parameters of the explanatory variables in ADAMX can be assumed to be constant over time or can be assumed to vary according to some mechanism. The most reasonable one in SSOE framework is the one relying on the same error for different components of the model, because this mechanism aligns with the model itself. Osman and King (2015) proposed one of such mechanisms, relying on the differences of the data. The main motivation of their approach was to make the dynamic ADAMX model stable, which is a challenging task. However, this mechanism relies on the assumption of non-stationarity of the explanatory variables, which does not always make sense (for example, it is unreasonable in case of promotional data). An alternative approach that we will discuss in this section, is the one originally proposed by Svetunkov (1985) based on stochastic approximation mechanism and further developed in Svetunkov and Svetunkov (2014).

We start with the following linear regression model: \[\begin{equation} y_{t} = a_{0,t-1} + a_{1,t-1} x_{1,t} + \dots + a_{n,t-1} x_{n,t} + \epsilon_t , \tag{10.16} \end{equation}\] where all parameters vary over time and \(a_{0,t}\) represents the value from the conventional additive error ETS model. The updating mechanism for the parameters is straight forward and relies on the ratio of the error term and the respective explanatory variables: \[\begin{equation} a_{i,t} = a_{i,t-1} + \left \lbrace \begin{aligned} &\delta_i \frac{\epsilon_t}{x_{i,t}} \text{ for each } i \in \{1, \dots, n\}, \text{ if } x_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. , \tag{10.17} \end{equation}\] where \(\delta_i\) is the smoothing parameter of the \(i\)-th explanatory variable. The same model can be represented in the state space form, based on the equations, similar to (10.4): \[\begin{equation} \begin{aligned} & {y}_{t} = \mathbf{w}'_t \mathbf{v}_{t-\mathbf{l}} + \epsilon_t \\ & \mathbf{v}_t = \mathbf{F} \mathbf{v}_{t-\mathbf{l}} + \mathbf{z}_t \mathbf{g} \epsilon_t \end{aligned} \tag{10.18} \end{equation}\] where \(\mathbf{z}_t = \mathrm{diag}\left(\mathbf{w}_t\right)^{-1}=\mathbf{I}_{k+n} \odot (\mathbf{w}_t \mathbf{1}_{k+n})\) is the diagonal matrix consisting of inverses of explanatory variables, \(\mathbf{I}_{k+n}\) is the identity matrix for \(k\) ADAM components and \(n\) explanatory variables and \(\odot\) is Hadamard product for element-wise multiplication. This is the inverse of the diagonal matrix based on the measurement vector, for which those values that cannot be inverted (due to division by zero) are substitute by zeroes in order to reflect the condition in (10.17). In addition to what (10.4) contained, we add smoothing parameters \(\delta_i\) in the persistence vector \(\mathbf{g}\) for each of the explanatory variables.

If the error term is multiplicative, then the model changes to: \[\begin{equation} \begin{aligned} & y_{t} = \exp \left(a_{0,t-1} + a_{1,t-1} x_{1,t} + \dots + a_{n,t-1} x_{n,t} + \log(1+ \epsilon_t) \right) \\ & a_{i,t} = a_{i,t-1} + \left \lbrace \begin{aligned} &\delta_i \frac{\log(1+\epsilon_t)}{x_{i,t}} \text{ for each } i \in \{1, \dots, n\}, \text{ if } x_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \end{aligned} . \tag{10.19} \end{equation}\] The formulation (10.19) differs from the conventional pure multiplicative ETS model because the smoothing parameter \(\delta_i\) is not included inside the error term \(1+\epsilon_t\), which simplifies some derivations and makes model easier to work with. Mixed ETS models can also have explanatory variables, but I suggest to align the type of explanatory variable model with the error term.

Finally, in order to distinguish the ADAMX with static parameters from the ADAMX with dynamic ones, we will use the letters “S” and “D” in the names of models. So, the model (10.9) can be called ETSX(A,N,N){S}, while the model (10.19), assuming that \(a_{0,t-1}=l_{t-1}\), would be called ETSX(M,N,N){D}. We use curly brackets in order to split the ETS states from the type of X. Furthermore, given that the model with static regressors is assumed in many contexts to be the default one, the ETSX(*,*,*){S} model can also be denoted as just ETSX(*,*,*).

10.3.1 Recursion for dynamic ADAMX

Similar to how it was discussed in the Section 10.2.2, we can have two cases in the dynamic model: (1) deterministic explanatory variables, (2) stochastic explanatory variables. For illustrative purposes, we will use a non-seasonal model for which the lag vector \(\mathbf{l}\) contains ones only, keeping in mind that other pure additive models can be easily used instead. The cases of non-additive ETS models are not discussed in this part in detail - the moments for these models need to be calculated based on simulations. So, as discussed previously, the model can be written in the following general way, assuming that all elements of \(\mathbf{l}\) are equal to one: \[\begin{equation} \begin{aligned} & {y}_{t} = \mathbf{w}'_t \mathbf{v}_{t-1} + \epsilon_t \\ & \mathbf{v}_t = \mathbf{F} \mathbf{v}_{t-1} + \mathbf{z}_t \mathbf{g} \epsilon_t \end{aligned} . \tag{10.20} \end{equation}\] Based on this model, we can get the recursive relation for \(h\) steps ahead, similar to how it was done in Section 5.2: \[\begin{equation} \begin{aligned} & {y}_{t+h} = \mathbf{w}'_{t+h} \mathbf{v}_{t+h-1} + \epsilon_{t+h} \\ & \mathbf{v}_{t+h-1} = \mathbf{F} \mathbf{v}_{t+h-2} + \mathbf{z}_{t+h-1} \mathbf{g} \epsilon_{t+h-1} \end{aligned} , \tag{10.21} \end{equation}\] where the second equation can be represented based on the values available on observation \(t\): \[\begin{equation} \mathbf{v}_{t+h-1} = \mathbf{F}^{h-1} \mathbf{v}_{t} + \sum_{j=1}^{h-1} \mathbf{F}^{h-1-j} \mathbf{z}_{t+j} \mathbf{g} \epsilon_{t+j} . \tag{10.22} \end{equation}\] Substituting the equation (10.22) in the measurement equation of (10.21) leads to the final recursion: \[\begin{equation} {y}_{t+h} = \mathbf{w}'_{t+h} \mathbf{F}^{h-1} \mathbf{v}_{t} + \mathbf{w}'_{t+h} \sum_{j=1}^{h-1} \mathbf{F}^{h-1-j} \mathbf{z}_{t+j} \mathbf{g} \epsilon_{t+j} + \epsilon_{t+h} . \tag{10.23} \end{equation}\]

10.3.2 Conditional moments for deterministic explanatory variables in ADAMX{D}

Based on this recursion, we can calculate the conditional mean and variance for the model. First, we assume that the explanatory variables are controlled by an analyst, and are known for \(j=1, \dots, h\): \[\begin{equation} \begin{aligned} \mu_{y,t+h} = & \text{E}(y_{t+h}|t) = \mathbf{w}'_{t+h} \mathbf{F}^{h-1} \mathbf{v}_{t} \\ & \text{V}(y_{t+h}|t) = \left(\mathbf{w}'_{t+h} \sum_{j=1}^{h-1} \mathbf{F}^{h-1-j} \mathbf{z}_{t+j} \mathbf{g} \right)^2 \sigma^2 + \sigma^2 \end{aligned} . \tag{10.24} \end{equation}\] The formulae for conditional moments in this case look similar to the ones from the pure additive ETS model in Section 5.3 with only difference being the interaction with time varying measurument vector.

10.3.3 Conditional mean for stochastic explanatory variables in ADAMX{D}

In the case of stochastic explanatory variables, the conditional expectation is straightforward and is similar to the one in the static ADAMX model: \[\begin{equation} \mu_{y,t+h} = \text{E}(y_{t+h}|t) = \boldsymbol{\mu}'_{w,t+h} \mathbf{F}^{h-1} \mathbf{v}_{t} , \tag{10.25} \end{equation}\] where \(\boldsymbol{\mu}'_{w,t+h}\) is the vector of conditional h steps ahead expectations for each element in the \(\mathbf{w}_{t+h}\). In case of ETS components, the vector would contain ones. However, when it comes to the conditional variance, it is more complicated, because it introduces complex interactions between variances of different variables and error term. As a result, it would be easier to get the correct variance based on simulations, assuming that the explanatory variables and the error term change according to some assumed models


• Osman, A.F., King, M.L., 2015. A new approach to forecasting based on exponential smoothing with independent regressors.
• Svetunkov, I., Svetunkov, S., 2014. Forecasting methods. Textbook for universities. Urait, Moscow.
• Svetunkov, S., 1985. Adaptive methods in the process of optimisation of regimes of electricity consumption.