## 6.5 Distributional assumptions in pure multiplicative ETS

The conventional assumption for the error term in ETS is that \(\epsilon_t\sim\mathcal{N}(0,\sigma^2)\), which guarantees that the conditional expectation of the model will be equal to the point forecasts when the trend and seasonal components are not multiplicative. In general, ETS works well in many cases with this assumption, mainly when the data is strictly positive, and the level of series is high (e.g. thousands of units). However, this assumption might become unhelpful when dealing with lower-level data because the models may start generating non-positive values, which contradicts the idea of pure multiplicative ETS models. Akram et al. (2009) studied the ETS models with multiplicative error and suggested that applying ETS on data in logarithms is a better approach than just using ETS(M,Y,Y) models (here “Y” stands for non-additive components). However, this approach sidesteps the ETS taxonomy, creating a new group of models. An alternative (also discussed in Akram et al., 2009) is to assume that the error term \(1+\epsilon_t\) follows some distribution for positive data. The authors mentioned log-normal, truncated and Gamma distributions but never explored them further.

Svetunkov and Boylan (2022) discussed several options for the distribution of \(1+\epsilon_t\) in ETS, including log-normal, Gamma and Inverse Gaussian. Other distributions for positive data can be applied as well, but their usage might become complicated, because they need to meet condition \(\mathrm{E}(1+\epsilon_t)=1\) in order for the expectation to coincide with the point forecasts for models with non-multiplicative trend and seasonality. For example, if the error term follows log-normal distribution, then this restriction implies that the location of the distribution should be non-zero: \(1+\epsilon_t\sim\mathrm{log}\mathcal{N}\left(-\frac{\sigma^2}{2},\sigma^2\right)\). Using this principle the following distributions can be used for ADAM ETS:

- Inverse Gaussian: \(\left(1+\epsilon_t \right) \sim \mathcal{IG}(1, \sigma^2)\), so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{IG}(\mu_{y,t}, \sigma^2)\);
- Gamma: \(\left(1+\epsilon_t \right) \sim \Gamma (\sigma^{-2}, \sigma^2)\), so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \Gamma (\sigma^{-2}, \sigma^2 \mu_{y,t})\);
- Log-normal: \(\left(1+\epsilon_t \right) \sim \mathrm{log}\mathcal{N}\left(-\frac{\sigma^2}{2}, \sigma^2\right)\) so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathrm{log}\mathcal{N}(\log \mu_{y,t} -\frac{\sigma^2}{2}, \sigma^2)\).

The MLE of \(s\) in \(\mathcal{IG}\) is straightforward and is: \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T} \sum_{t=1}^{T} \frac{e_{t}^2}{1+e_t} , \tag{6.16} \end{equation}\] where \(e_t\) is the estimate of the error term \(\epsilon_t\). However, when it comes to the MLE of scale parameter for the log-normal distribution with the aforementioned restrictions, it is more complicated and is (Svetunkov and Boylan, 2022): \[\begin{equation} \hat{\sigma}^2 = 2\left(1-\sqrt{ 1-\frac{1}{T} \sum_{t=1}^{T} \log^2(1+e_{t})}\right). \tag{6.17} \end{equation}\] Finally, MLE of \(s\) in \(\mathcal{\Gamma}\) does not have a closed form. Luckily, method of moments can be used to obtain its value (Svetunkov and Boylan, 2022): \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T} \sum_{t=1}^{T} e_{t}^2 . \tag{6.18} \end{equation}\] This value will coincide with the variance of the error term, given the imposed restrictions on the \(\Gamma\) distribution.

Even if we deal with strictly positive high level data, it is not necessary to limit the distribution with Normal only. The following distributions can be applied as well:

- Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma^2)\), implying that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{N}(\mu_{y,t}, \mu_{y,t}^2 \sigma^2)\);
- Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s)\), meaning that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{Laplace}(\mu_{y,t}, \mu_{y,t} s)\);
- S: \(\epsilon_t \sim \mathcal{S}(0, s)\), so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{S}(\mu_{y,t}, \sqrt{\mu_{y,t}} s)\);
- Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s, \beta)\) and \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{GN}(\mu_{y,t}, \mu_{y,t}^\beta s)\);

The MLE of scale parameters for these distributions will be calculated differently than in the case of pure additive models. For example, for the normal distribution it is: \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T}\sum_{t=1}^T \frac{y_t-\hat{\mu}_{y,t}}{\hat{\mu}_{y,t}} , \tag{6.19} \end{equation}\] where the main difference from the additive error case arises from the measurement equation of the multiplicative error models: \[\begin{equation} y_t = \mu_{y,t} (1+\epsilon_t), \tag{6.20} \end{equation}\] implying that \[\begin{equation} e_t = \frac{y_t-\hat{\mu}_{y,t}}{\hat{\mu}_{y,t}}. \tag{6.21} \end{equation}\] The estimates of scale can then be used in the next phase, when parameters are optimised via the maximisation of respective log-likelihood function. The maximum likelihood approach is in case of ADAM models is discussed in detail in Section 11.1.

The distributional assumptions impact both the estimation of models and the prediction intervals. In the case of asymmetric distributions (such as log-normal, Gamma and Inverse Gaussian), the intervals will typically be asymmetric, with the upper bound being further away from the point forecast than the lower one. Furthermore, even with the comparable estimates of scales of distributions, Inverse Gaussian distribution will typically produce wider bounds than log-normal and Gamma. The width of intervals relates to the kurtosis of distributions, which is discussed in Chapter 3 of Svetunkov (2022a).