## 6.5 Distributional assumptions in pure multiplicative ETS

The conventional assumption for the error term in ETS is that \(\epsilon_t \sim \mathcal{N}(0,\sigma^2)\). The condition that \(\mathrm{E}(\epsilon_t)=0\) guarantees that the conditional expectation of the model will be equal to the point forecasts when the trend and seasonal components are not multiplicative. In general, ETS works well in many cases with this assumption, mainly when the data is strictly positive, and the level of series is high (e.g. thousands of units). However, this assumption might become unhelpful when dealing with lower-level data because the models may start generating non-positive values, which contradicts the idea of pure multiplicative ETS models. Akram et al. (2009) studied the ETS models with the multiplicative error and suggested that applying ETS on data in logarithms is a better approach than just using ETS(M,Y,Y) models (here “Y” stands for a non-additive component). However, this approach sidesteps the ETS taxonomy, creating a new group of models. An alternative (also discussed in Akram et al., 2009) is to assume that the error term \(1+\epsilon_t\) follows some distribution for positive data. The authors mentioned Log-Normal, truncated Normal and Gamma distributions but never explored them further.

Svetunkov and Boylan (2022) discussed several options for the distribution of \(1+\epsilon_t\) in ETS, including Log-Normal, Gamma and Inverse Gaussian. Other distributions for positive data can be applied as well, but their usage might become complicated, because they need to meet condition \(\mathrm{E}(1+\epsilon_t)=1\) in order for the expectation to coincide with the point forecasts for models with non-multiplicative trend and seasonality. For example, if the error term follows Log-Normal distribution, then this restriction implies that the location of the distribution should be non-zero: \(1+\epsilon_t\sim\mathrm{log}\mathcal{N}\left(-\frac{\sigma^2}{2},\sigma^2\right)\). Using this principle the following distributions can be used for ADAM ETS:

- Inverse Gaussian: \(\left(1+\epsilon_t \right) \sim \mathcal{IG}(1, \sigma^2)\), so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{IG}(\mu_{y,t}, \sigma^2)\);
- Gamma: \(\left(1+\epsilon_t \right) \sim \Gamma (\sigma^{-2}, \sigma^2)\), so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \Gamma (\sigma^{-2}, \sigma^2 \mu_{y,t})\);
- Log-Normal: \(\left(1+\epsilon_t \right) \sim \mathrm{log}\mathcal{N}\left(-\frac{\sigma^2}{2}, \sigma^2\right)\) so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathrm{log}\mathcal{N}(\log \mu_{y,t} -\frac{\sigma^2}{2}, \sigma^2)\).

The MLE of \(s\) in Inverse Gaussian is straightforward (see Section 11.1) and is: \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T} \sum_{t=1}^{T} \frac{e_{t}^2}{1+e_t} , \tag{6.16} \end{equation}\] where \(e_t\) is the estimate of the error term \(\epsilon_t\). However, when it comes to the MLE of scale parameter for the Log-Normal distribution with the aforementioned restrictions, it is more complicated and is (Svetunkov and Boylan, 2022): \[\begin{equation} \hat{\sigma}^2 = 2\left(1-\sqrt{ 1-\frac{1}{T} \sum_{t=1}^{T} \log^2(1+e_{t})}\right). \tag{6.17} \end{equation}\] Finally, MLE of \(s\) in Gamma does not have a closed form. Luckily, method of moments can be used to obtain its value (Svetunkov and Boylan, 2022): \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T} \sum_{t=1}^{T} e_{t}^2 . \tag{6.18} \end{equation}\] This value will coincide with the variance of the error term, given the imposed restrictions on the Gamma distribution.

Even if we deal with strictly positive high level data, it is not necessary to limit the distribution exclusively with the positive ones. The following distributions can be applied as well:

- Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma^2)\), implying that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{N}(\mu_{y,t}, \mu_{y,t}^2 \sigma^2)\);
- Laplace: \(\epsilon_t \sim \mathcal{L}(0, s)\), meaning that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{L}(\mu_{y,t}, \mu_{y,t} s)\);
- Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s, \beta)\) and \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{GN}(\mu_{y,t}, \mu_{y,t}^\beta s)\);
- S: \(\epsilon_t \sim \mathcal{S}(0, s)\), so that \(y_t = \mu_{y,t} (1+\epsilon_t) \sim \mathcal{S}(\mu_{y,t}, \sqrt{\mu_{y,t}} s)\);

The MLE of scale parameters for these distributions will be calculated differently than in the case of pure additive models (these are provided in Section 11.1). For example, for the Normal distribution it is: \[\begin{equation} \hat{\sigma}^2 = \frac{1}{T}\sum_{t=1}^T \frac{y_t-\hat{\mu}_{y,t}}{\hat{\mu}_{y,t}} , \tag{6.19} \end{equation}\] where the main difference from the additive error case arises from the measurement equation of the multiplicative error models: \[\begin{equation} y_t = \mu_{y,t} (1+\epsilon_t), \tag{6.20} \end{equation}\] implying that \[\begin{equation} e_t = \frac{y_t-\hat{\mu}_{y,t}}{\hat{\mu}_{y,t}}. \tag{6.21} \end{equation}\]

The distributional assumptions impact both the estimation of models and the prediction intervals. In the case of asymmetric distributions (such as Log-Normal, Gamma and Inverse Gaussian), the intervals will typically be asymmetric, with the upper bound being further away from the point forecast than the lower one. Furthermore, even with the comparable estimates of scales of distributions, Inverse Gaussian distribution will typically produce wider bounds than Log-Normal and Gamma, making it a viable option for data with higher uncertainty. The width of intervals for these distributions relates to their kurtoses (Svetunkov and Boylan, 2022).