4.5 ETS assumptions, estimation and selection

Several assumptions need to hold for the conventional ETS models to work properly. Some of them have already been discussed in Section 1.4.1, and we will not discuss them here again. What is important in our context is that the conventional ETS assumes that the error term \(\epsilon_t\) follows the Normal distribution with zero mean and variance \(\sigma^2\). There are several points that need to be discussed related to this:

If the mean was not equal to zero then, for example, the level models would act as models with drift (see Subsection 3.3.4). This implies that the architecture of the model should change, and the conventional ETS models cannot be efficiently applied to such data. Furthermore, correctly estimating such models would not be straightforward, because ETS exhibits “pull to centre” effect, where the predicted value gets closer to the actual one based on the forecast error of the model. As a result, it would be challenging to capture the non-zero mean of the error term. So, the zero mean assumption is essential for such dynamic models as ETS.
The normal distribution is defined for positive, negative and zero values. This is not a big deal for additive models, which assume that the actual value can be anything. And it is not an issue for the multiplicative models when we deal with high-level positive data (e.g. thousands of units): the variance of the error term will be small enough for the \(\epsilon_t\) not to become less than minus one. However, if the level of the data is low, then the variance of the error term can be large enough for the normally distributed error to cover negative values, less than minus one. This implies that the error term \(1+\epsilon_t\) can become negative, and the model will break. This is a potential flaw in the conventional ETS model with the multiplicative error term. So, what the standard multiplicative error ETS model assumes, in fact, is that the data we work with is strictly positive and has high-level values.

Based on the assumption of normality of error term, the ETS model can be estimated via the maximisation of likelihood (see Chapter 16 of Svetunkov, 2022a), which is equivalent to the minimisation of the mean squared one step ahead forecast error \(e_t\). Note that in order to apply the ETS models to the data, we also need to know the initial values of components, \(\hat{l}_0, \hat{b}_0, \hat{s}_{-m+2}, \hat{s}_{-m+3}, \dots, \hat{s}_{0}\). The conventional approach is to estimate these values together with the smoothing parameters during likelihood maximisation. As a result, the optimisation might involve a large number of parameters. In addition, the variance of the error term is considered as an additional parameter in the maximum likelihood estimation, so the number of parameters for different models is (here “*” stands for any type):

ETS(*,N,N) – 3 parameters: \(\hat{l}_0\), \(\hat{\alpha}\) and \(\hat{\sigma}^2\);
ETS(*,*,N) – 5 parameters: \(\hat{l}_0\), \(\hat{b}_0\), \(\hat{\alpha}\), \(\hat{\beta}\) and \(\hat{\sigma}^2\);
ETS(*,*d,N) – 6 parameters: \(\hat{l}_0\), \(\hat{b}_0\), \(\hat{\alpha}\), \(\hat{\beta}\), \(\hat{\phi}\) and \(\hat{\sigma}^2\);
ETS(*,N,*) – 4+m-1 parameters: \(\hat{l}_0\), \(\hat{s}_{-m+2}, \hat{s}_{-m+3}, \dots, \hat{s}_{0}\), \(\hat{\alpha}\), \(\hat{\gamma}\) and \(\hat{\sigma}^2\);
ETS(*,*,*) – 6+m-1 parameters: \(\hat{l}_0\), \(\hat{b}_0\), \(\hat{s}_{-m+2}, \hat{s}_{-m+3}, \dots, \hat{s}_{0}\), \(\hat{\alpha}\), \(\hat{\beta}\), \(\hat{\gamma}\) and \(\hat{\sigma}^2\);
ETS(*,*d,*) – 7+m-1 parameters: \(\hat{l}_0\), \(\hat{b}_0\), \(\hat{s}_{-m+2}, \hat{s}_{-m+3}, \dots, \hat{s}_{0}\), \(\hat{\alpha}\), \(\hat{\beta}\), \(\hat{\gamma}\), \(\hat{\phi}\) and \(\hat{\sigma}^2\).

Remark. In the case of seasonal models, we typically make sure that the initial seasonal indices are normalised, so we only need to estimate \(m-1\) of them, the last one is calculated based on the linear combination of the others. For example, for the additive seasonality, it is equal to \(-\sum_{j=1}^{m-1} s_j\) because the sum of all the indices should be equal to zero.

When it comes to selecting the most appropriate model, the conventional approach involves the application of all models to the data and then selecting the most appropriate of them based on an information criterion (see Section 16.4 of Svetunkov, 2022a). This was first proposed by Hyndman et al. (2002). In the case of the conventional ETS model, this relies on the likelihood value of normal distribution used in the estimation of the model.

Finally, the assumption of normality is used to generate prediction intervals from the model. There are typically two ways of doing that:

Calculating the variance of multiple steps ahead forecast error and then using it for the intervals construction (see Chapter 6 of Hyndman et al., 2008);
Generating thousands of possible paths for the components of the series and the actual values and then taking the necessary quantiles for the prediction intervals;

Typically, (1) is applied for pure additive models, where the closed forms for the variances are known, and the assumption of normality holds for several steps ahead. In some special cases of mixed models, approximations for variances work on short horizons (see Section 6.4 of Hyndman et al., 2008). But in all the other cases, (2) should be used, despite being typically slower than (1) and producing bounds that differ slightly from run to run due to randomness.

References

• Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D., 2008. Forecasting with Exponential Smoothing. Springer Berlin Heidelberg.

• Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting. 18, 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8

• Svetunkov, I., 2022a. Statistics for business analytics. https://openforecast.org/sba/ (version: 31.10.2022)