This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

7.4 Distributional assumptions in pure multiplicative ETS

The conventional assumption for the error term in ETS is that \(\epsilon_t\sim\mathcal{N}(0,\sigma^2)\), which guarantees that the conditional expectation of the model will be equal to the point forecasts, when the trend and seasonal components are not multiplicative. In general, ETS works well in many cases with this assumption, mainly when the data is strictly positive and the level of series is high (e.g. thousands of units). However, when dealing with lower level data, this assumption might become unhelpful, because the models may start generating non-positive values, which contradicts the idea of pure multiplicative ETS models. Akram, Hyndman, and Ord (2009) studied the ETS models with multiplicative error and suggested that applying ETS on data in logarithms is a better approach than just using ETS(M,Y,Y) models (here "Y" stands for non-additive components). However, this approach sidesteps the ETS taxonomy, creating a new group of models. An alternative (also discussed in Akram, Hyndman, and Ord (2009)) is to assume that the error term \(1+\epsilon_t\) follows some distribution for positive data. The authors mentioned log Normal, truncated and Gamma distributions, but never explored them further.

Svetunkov and Boylan (2020) discussed several options for the distribution of the \(1+\epsilon_t\) in ETS and came to conclusion that the most suitable distribution in this case is the Inverse Gaussian. Having said that, other distributions for positive data can be applied as well, but their usage might become complicated, because they need to meet condition \(\text{E}(1+\epsilon_t)=1\) in order for the expectation to coincide with the point forecasts for models with non-multiplicative trend and seasonality. For example, if the error term follows log Normal distribution, then this restriction implies that the location of the distribution should be non-zero: \(1+\epsilon_t\sim\text{log}\mathcal{N}\left(-\frac{\sigma^2}{2},\sigma^2\right)\). Based on that the following distributions are supported by ADAM:

  1. Inverse Gaussian: \(\left(1+\epsilon_t \right) \sim \mathcal{IG}(1, s)\);
  2. Log Normal: \(\left(1+\epsilon_t \right) \sim \text{log}\mathcal{N}\left(-\frac{\sigma^2}{2}, \sigma^2\right)\).
The MLE of \(s\) in \(\mathcal{IG}\) is straightforward and is: \[\begin{equation} \hat{s} = \frac{1}{T} \sum_{t=1}^{T} \frac{e_{t}^2}{1+e_t} , \tag{7.17} \end{equation}\] where \(e_t\) is the estimate of the error term \(\epsilon_t\). However, when it comes to the MLE of scale parameter for the log Normal distribution with the aforementioned restrictions, it is more complicated and is (Svetunkov and Boylan 2020): \[\begin{equation} \hat{\sigma}^2 = 2\left(1-\sqrt{ 1-\frac{1}{T} \sum_{t=1}^{T} \log^2(1+e_{t})}\right). \tag{7.18} \end{equation}\]

Even if we assume that we deal with strictly positive high level data and that \(\epsilon_t\) can be non-positive, it is not necessary to limit the distribution with Normal only. The following distributions can be applied as well:

  1. Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma^2)\), implying that \(y_t = \mu_t (1+\epsilon_t) \sim \mathcal{N}(\mu_t, \mu_t^2 \sigma^2)\);
  2. Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s)\), meaning that \(y_t = \mu_t (1+\epsilon_t) \sim \mathcal{Laplace}(\mu_t, \mu_t s)\);
  3. S: \(\epsilon_t \sim \mathcal{S}(0, s)\), so that \(y_t = \mu_t (1+\epsilon_t) \sim \mathcal{S}(\mu_t, \sqrt{\mu_t} s)\);
  4. Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s, \beta)\) and \(y_t = \mu_t (1+\epsilon_t) \sim \mathcal{GN}(\mu_t, \mu_t^\beta s)\);
  5. Logistic: \(\epsilon_t \sim \mathcal{Logis}(0, s)\);
  6. Student's t: \(\epsilon_t \sim \mathcal{t}(\nu)\);
  7. Asymmetric Laplace: \(\epsilon_t \sim \mathcal{ALaplace}(0, s, \alpha)\) with \(y_t = \mu_t (1+\epsilon_t) \sim \mathcal{ALaplace}(\mu_t, \mu_t s, \alpha)\).
Note that the MLE of scale parameters for these distributions will be calculated differently than in the case of pure additive models. For example, for the normal distribution it is: \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T}\sum_{t=1}^T \frac{y_t-\hat{\mu}_t}{\hat{\mu}_t} , \tag{7.19} \end{equation}\] where the main difference from the additive error case arises from the measurement equation of the multiplicative error models: \[\begin{equation} y_t = \mu_t (1+\epsilon_t), \tag{7.20} \end{equation}\] implying that \[\begin{equation} e_t = \frac{y_t-\hat{\mu}_t}{\hat{\mu}_t}. \tag{7.21} \end{equation}\]

The estimates of scale can then be used in the estimation phase, when parameters are optimised via the maximisation of respective log-likelihood function.

References

Akram, Muhammad, Rob J. Hyndman, and J. Keith Ord. 2009. “Exponential Smoothing and Non-negative Data.” Australian & New Zealand Journal of Statistics 51 (4): 415–32. doi:10.1111/j.1467-842X.2009.00555.x.

Svetunkov, Ivan, and John E. 2020. “Dealing with Positive Data Using Pure Multiplicative ETS Models.” Department of Management Science, Lancaster University.