5.5 Distributional assumptions in pure additive ETS

While the conventional ETS assumes that the error term follows Normal distribution, ADAM ETS proposes some flexibility, implementing the following options for the error term distribution in the additive error models:

  1. Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma^2)\), meaning that \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{N}(\mu_{y,t}, \sigma^2)\);
  2. Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s)\), so that \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{Laplace}(\mu_{y,t}, s)\);
  3. S: \(\epsilon_t \sim \mathcal{S}(0, s)\), implying that \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{S}(\mu_{y,t}, s)\);
  4. Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s, \beta)\), leading to \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{GN}(\mu_{y,t}, s, \beta)\).

The conditional moments and stability / forecastability conditions do not change for the model with these new assumptions. The main element that changes is the scale and the width of prediction intervals. Given that scales of these distributions are linearly related to the variance, one can calculate the conditional variance as discussed in Section 5.3 and then use it in order to obtain the respective scales. Having the scales it becomes straightforward to calculate the needed quantiles for the prediction intervals. Here are the formulae for the scales of distributions mentioned above:

  1. Normal: scale is \(\sigma^2_h\);
  2. Laplace: \(s_h = \sigma_h \sqrt{\frac{1}{2}}\);
  3. S: \(s_h = \sqrt{\sigma_h}\sqrt[4]{\frac{1}{120}}\);
  4. Generalised Normal: \(s_h = \sigma_h \sqrt{\frac{\Gamma(1/\beta)}{\Gamma(3/\beta)}}\).

The estimation of pure additive ETS models can be done via the maximisation of the likelihood of the assumed distribution (see Chapter 13 of Svetunkov, 2022a), which in some cases coincide with the popular losses (e.g. Normal and MSE, or Laplace and MAE).

In addition, the following more exotic options for the additive error models are available in ADAM ETS:

  1. Log-normal: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \text{log}\mathcal{N}\left(-\frac{\sigma^2}{2}, \sigma^2\right)\), implying that \(y_t = \mu_{y,t} \left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \text{log}\mathcal{N}\left(\log\mu_{y,t} -\frac{\sigma^2}{2}, \sigma^2\right)\). Here, \(\sigma^2\) is the variance of the error term in logarithms and the \(-\frac{\sigma^2}{2}\) appears due to the restriction \(\text{E}(\epsilon_t)=0\).
  2. Inverse Gaussian: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{IG}(1, \sigma^2)\) with \(y_t=\mu_{y,t} \left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{IG}\left(\mu_{y,t}, \frac{\sigma^2}{\mu_{y,t}}\right)\);
  3. Gamma: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{\Gamma}(\sigma^{-2}, \sigma^2)\), so that \(y_t = \mu_{y,t} \left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{\Gamma}(\sigma^{-2}, \sigma^2 \mu_{y,t})\);

where \(\mu_{y,t} = \mathbf{w}^\prime \mathbf{v}_{t-\mathbf{l}}\) is one step ahead point forecast,

The possibility of application of these distributions arises from the reformulation of the original pure additive model (5.4) into: \[\begin{equation} \begin{aligned} {y}_{t} = &\mathbf{w}^\prime \mathbf{v}_{t-\mathbf{l}}\left(1 + \frac{\epsilon_t}{\mathbf{w}^\prime \mathbf{v}_{t-\mathbf{l}}}\right) \\ \mathbf{v}_{t} = &\mathbf{F} \mathbf{v}_{t-\mathbf{l}} + \mathbf{g} \epsilon_t \end{aligned}. \tag{5.22} \end{equation}\] The connection between the two formulations becomes apparent when opening the brackets in the measurement equation of (5.22). Note that in this case, the model assumes that the data is strictly positive, and while it might be possible to fit the model on the data with negative values, the calculation of the scale and the likelihood might become impossible. Using alternative losses (e.g. MSE) is a possible solution in this case.


• Svetunkov, I., 2022a. Statistics for business analytics. https://openforecast.org/sba/ (version: 31.03.2022)