$$\newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}}$$

Remark. The model discussed in this section assumes particular dynamics of parameters, aligning with what the conventional ETS assumes: regression parameters are correlated with the states of the model. It does not treat parameters as independent as, for example, MSOE state space models do, which makes this model restrictive in its application. But this type of model works well with categorical variables, as I show later in this section.

As discussed in Section 10.1, the parameters of the explanatory variables in ADAMX can be assumed to be constant over time or can be assumed to vary according to some mechanism. The most reasonable one in the SSOE framework relies on the same error for different components of the model because this mechanism aligns with the model itself. Osman and King (2015) proposed one such mechanism, relying on the differences of the data. The primary motivation of their approach was to make the dynamic ETSX model stable, which is a challenging task. However, this mechanism relies on the assumption of non-stationarity of the explanatory variables, which does not always make sense (for example, it is unreasonable in the case of promotional data). An alternative approach discussed in this section is the one initially proposed by Svetunkov (1985), based on the stochastic approximation mechanism and further developed in Svetunkov and Svetunkov (2014).

We start with the following linear regression model: $$$y_{t} = a_{0,t-1} + a_{1,t-1} x_{1,t} + \dots + a_{n,t-1} x_{n,t} + \epsilon_t , \tag{10.16}$$$ where all parameters vary over time and $$a_{0,t}$$ represents the value from the conventional additive error ETS model (e.g. level of series, i.e. $$a_{0,t}=l_t$$). The updating mechanism for the parameters in this case is straight forward and relies on the ratio of the error term and the respective explanatory variables: a_{i,t} = a_{i,t-1} + \left \lbrace \begin{aligned} &\delta_i \frac{\epsilon_t}{x_{i,t}} \text{ for each } i \in \{1, \dots, n\}, \text{ if } x_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. , \tag{10.17} where $$\delta_i$$ is the smoothing parameter of the $$i$$-th explanatory variable. The same model can be represented in the state space form, based on the equations, similar to (10.4): \begin{aligned} & {y}_{t} = \mathbf{w}'_t \mathbf{v}_{t-\boldsymbol{l}} + \epsilon_t \\ & \mathbf{v}_t = \mathbf{F} \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{z}_t \mathbf{g} \epsilon_t \end{aligned} , \tag{10.18} where $$\mathbf{z}_t = \mathrm{diag}\left(\mathbf{w}_t\right)^{-1}= \left(\mathbf{I}_{k+n} \odot (\mathbf{w}_t \mathbf{1}^{\prime}_{k+n})\right)^{-1}$$ is the diagonal matrix consisting of inverses of explanatory variables, $$\mathbf{I}_{k+n}$$ is the identity matrix for $$k$$ components and $$n$$ explanatory variables, and $$\odot$$ is Hadamard product for element-wise multiplication. This is the inverse of the diagonal matrix based on the measurement vector, for which those values that cannot be inverted (due to division by zero) are substituted by zeroes in order to reflect the condition in (10.17). In addition to what (10.4) contained, we add smoothing parameters $$\delta_i$$ in the persistence vector $$\mathbf{g}$$ for each of the explanatory variables.

If the error term is multiplicative, then the model changes to: \begin{aligned} & y_{t} = \exp \left(a_{0,t-1} + a_{1,t-1} x_{1,t} + \dots + a_{n,t-1} x_{n,t} + \log(1+ \epsilon_t) \right) \\ & a_{i,t} = a_{i,t-1} + \left \lbrace \begin{aligned} &\delta_i \frac{\log(1+\epsilon_t)}{x_{i,t}} \text{ for each } i \in \{1, \dots, n\}, \text{ if } x_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \end{aligned} . \tag{10.19} The formulation (10.19) differs from the conventional pure multiplicative ETS model because the smoothing parameter $$\delta_i$$ is not included inside the error term $$1+\epsilon_t$$, which simplifies some derivations and makes the model easier to work with (it has some similarities to logARIMA from Subsection 9.1.4).

Note that if it is suspected that the explanatory variables exhibit non-stationarity and are not cointegrated with the response variable, then their differences can be used instead of $$x_{i,t}$$ in (10.18) and (10.19). In this case, the additive model would coincide with the one proposed by Osman and King (2015). However, the decision of taking the differences for the different parts of the model should be made based on each specific situation, not across the whole set of variables. Here is an example of the ETSX(A,N,N) model with differenced explanatory variables: \begin{aligned} & y_{t} = a_{0,t-1} + a_{1,t-1} \Delta x_{1,t} + \dots + a_{n,t-1} \Delta x_{n,t} + \epsilon_t , \\ & a_{i,t} = a_{i,t-1} + \left \lbrace \begin{aligned} &\delta_i \frac{\epsilon_t}{\Delta x_{i,t}} \text{ for each } i \in \{1, \dots, n\}, \text{ if } \Delta x_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. , \end{aligned} \tag{10.20} where $$\Delta x_{i,t} = x_{i,t} -x_{i,t-1}$$ is the differences of the $$i$$-th exogenous variable.

Finally, to distinguish the ADAMX with static parameters from the one with dynamic ones, we will use the letters “S” and “D” in the names of models. So, the model (10.9) can be called ETSX(A,N,N){S}, while the model (10.19), assuming that $$a_{0,t-1}=l_{t-1}$$, would be called ETSX(M,N,N){D}. We use curly brackets to split the ETS states from the type of X. Furthermore, given that the model with static regressors is assumed in many contexts to be the default one, the ETSX(*,*,*){S} model can also be denoted as just ETSX(*,*,*).

Similarly to how it was discussed in Subsection 10.2.2, we can have two cases in the dynamic model: (1) deterministic explanatory variables, (2) stochastic explanatory variables. For illustrative purposes, we will use a non-seasonal model for which the lag vector $$\boldsymbol{l}$$ contains ones only, keeping in mind that other pure additive models can be easily used instead with slight changes in formulae. The cases of non-additive ETS models are complicated and are not discussed in this monograph. So, as discussed previously, the model can be written in the following general way: \begin{aligned} & {y}_{t} = \mathbf{w}'_t \mathbf{v}_{t-1} + \epsilon_t \\ & \mathbf{v}_t = \mathbf{F} \mathbf{v}_{t-1} + \mathbf{z}_t \mathbf{g} \epsilon_t \end{aligned} . \tag{10.21} Based on this model, we can get the recursive relation for $$h$$ steps ahead, similar to how it was done in Section 5.2: \begin{aligned} & {y}_{t+h} = \mathbf{w}'_{t+h} \mathbf{v}_{t+h-1} + \epsilon_{t+h} \\ & \mathbf{v}_{t+h-1} = \mathbf{F} \mathbf{v}_{t+h-2} + \mathbf{z}_{t+h-1} \mathbf{g} \epsilon_{t+h-1} \end{aligned} , \tag{10.22} where the second equation can be expressed via matrices and vectors using the values available on observations from $$t$$ to $$t+h-1$$: $$$\mathbf{v}_{t+h-1} = \mathbf{F}^{h-1} \mathbf{v}_{t} + \sum_{j=1}^{h-1} \mathbf{F}^{h-1-j} \mathbf{z}_{t+j} \mathbf{g} \epsilon_{t+j} . \tag{10.23}$$$ Substituting the equation (10.23) in the measurement equation of (10.22) leads to the final recursion: $$${y}_{t+h} = \mathbf{w}'_{t+h} \mathbf{F}^{h-1} \mathbf{v}_{t} + \mathbf{w}'_{t+h} \sum_{j=1}^{h-1} \mathbf{F}^{h-1-j} \mathbf{z}_{t+j} \mathbf{g} \epsilon_{t+j} + \epsilon_{t+h} , \tag{10.24}$$$ which can be used for the derivation of moments of ADAMX{D}.

10.3.2 Conditional moments for deterministic explanatory variables in ADAMX{D}

Based on the recursion (10.24), we can calculate the conditional mean and variance for the model. First, we assume that the explanatory variables are controlled by an analyst, and are known for all $$j=1, \dots, h$$, which leads to: \begin{aligned} \mu_{y,t+h} = & \mathrm{E}(y_{t+h}|t) = \mathbf{w}'_{t+h} \mathbf{F}^{h-1} \mathbf{v}_{t} \\ & \mathrm{V}(y_{t+h}|t) = \left( \left(\mathbf{w}'_{t+h} \sum_{j=1}^{h-1} \mathbf{F}^{h-1-j} \mathbf{z}_{t+j} \mathbf{g} \right)^2 + 1 \right) \sigma^2 \end{aligned} . \tag{10.25} The formulae for conditional moments in this case look similar to the ones from the pure additive ETS model in Section 5.3 with the only difference being that the element $$\mathbf{z}_{t+j} \mathbf{g}$$ is in general not equal to zero for the parameters of the explanatory variables.

10.3.3 Conditional mean for stochastic explanatory variables in ADAMX{D}

In the case of stochastic explanatory variables, the conditional expectation is straightforward and is similar to the one in the static ADAMX model: $$$\mu_{y,t+h} = \mathrm{E}(y_{t+h}|t) = \boldsymbol{\mu}'_{w,t+h} \mathbf{F}^{h-1} \mathbf{v}_{t} , \tag{10.26}$$$ where $$\boldsymbol{\mu}'_{w,t+h}$$ is the vector of conditional $$h$$ steps ahead expectations for each element in the $$\mathbf{w}_{t+h}$$. In the case of ETS components, the vector would contain ones. However, when it comes to conditional variance, it becomes more complicated because it introduces complex interactions between variances of different variables and the error term of the model. As a result, it would be easier to get the correct variance based on simulations, assuming that the explanatory variables and the error term change according to some assumed models instead of deriving analytical expressions.

References

• Osman, A.F., King, M.L., 2015. A New Approach to Forecasting Based on Exponential Smoothing with Independent Regressors. Department of Econometrics and Business Statistics. http://econpapers.repec.org/paper/mshebswps/2015-2.htm
• Svetunkov, I., Svetunkov, S., 2014. Forecasting Methods. Textbook for Universities. Urait, Moscow.
• Svetunkov, S., 1985. Adaptive Methods in the Process of Optimisation of Regimes of Electricity Consumption. Leningrad Engineering Economic Institute.