## 4.4 Several examples of ETS and related exponential smoothing methods

There are other exponential smoothing methods, which include more components, as discussed in Section 3.1. This includes but is not limited to: Holt’s (Holt, 2004, originally proposed in 1957), Holt-Winter’s (Winters, 1960), multiplicative trend (Pegels, 1969), Damped trend (originally proposed by Roberts, 1982 and then picked up by @Gardner1985a), Damped trend Holt-Winters (Gardner and McKenzie, 1989) and damped multiplicative trend (James W Taylor, 2003) methods. We will not discuss them here one by one, as we will not use them further in this monograph. Instead, we will focus on the ETS models underlying them.

We already understand that there can be different components in time series and that they can interact either in an additive or a multiplicative way, which gives us the taxonomy above, discussed in Section 4.1. This section considers several examples of ETS models and their relations to the conventional exponential smoothing methods.

### 4.4.1 ETS(A,A,N)

This is also sometimes known as local trend model and is formulated similar to ETS(A,N,N), but with addition of the trend equation. It underlies Holt’s method (Ord et al., 1997): \begin{aligned} & y_{t} = l_{t-1} + b_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + b_{t-1} + \alpha \epsilon_t \\ & b_t = b_{t-1} + \beta \epsilon_t \end{aligned} , \tag{4.12} where $$\beta$$ is the smoothing parameter for the trend component. It has a similar idea as ETS(A,N,N): the states evolve, and the speed of their change depends on the values of $$\alpha$$ and $$\beta$$. The trend is not deterministic in this model: both the intercept and the slope change over time. The higher the smoothing parameters are, the more uncertain the level and the slope will be; thus, the higher the uncertainty about the future values is.

Here is an example of the data that corresponds to the ETS(A,A,N) model:

y <- sim.es("AAN", 120, 1, 12, persistence=c(0.3,0.1),
initial=c(1000,20), mean=0, sd=20)
plot(y)

The series in Figure 4.7 demonstrates a trend that changes over time. If we needed to produce forecasts for this data, we would capture the dynamics of the trend component and then use the last values for the several steps ahead prediction.

The point forecast h steps ahead from this model is a straight line with a slope $$b_t$$ (as shown in Table 4.1 from Section 4.2): $$$\mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} + h b_t. \tag{4.13}$$$ This becomes apparent if one takes the conditional expectations E$$(l_{t+h}|t)$$ and E$$(b_{t+h}|t)$$ in the second and third equations of (4.12) and then inserts them in the measurement equation. Graphically it will look as shown in Figure 4.8:

esModel <- es(y, h=10, silent=FALSE)

If you want to experiment with the model and see how its parameters influence the fit and forecast, you can use the following R code:

esModel <- es(y\$data, h=10, silent=FALSE, persistence=c(0.2,0.1))

where persistence is the vector of smoothing parameters (first $$\hat\alpha$$, then $$\hat\beta$$). By changing their values, we will make the model less/more responsive to the changes in the data.

In a special case, ETS(A,A,N) corresponds to the Random Walk with drift (Subsection 3.3.4), when $$\beta=0$$ and $$\alpha=1$$: \begin{equation*} \begin{aligned} & y_{t} = l_{t-1} + b_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + b_{t-1} + \epsilon_t \\ & b_t = b_{t-1} \end{aligned} , \end{equation*} or \begin{aligned} & y_{t} = l_{t-1} + b_{0} + \epsilon_t \\ & l_t = l_{t-1} + b_{0} + \epsilon_t \end{aligned} . \tag{4.14} The connection between the two models becomes apparent, when substituting the first equation into the second one in (4.14) to obtain: \begin{aligned} & y_{t} = l_{t-1} + b_{0} + \epsilon_t \\ & l_t = y_{t} \end{aligned} or after inserting the second equation into the first one: $$$y_{t} = y_{t-1} + b_{0} + \epsilon_t .$$$

Finally, in another special case, when both $$\alpha$$ and $$\beta$$ are zero, the model reverts to the Global Trend discussed in Subsection 3.3.5 and becomes: \begin{equation*} \begin{aligned} & y_{t} = l_{t-1} + b_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + b_{t-1} \\ & b_t = b_{t-1} \end{aligned} , \end{equation*} or \begin{aligned} & y_{t} = l_{t-1} + b_{0} + \epsilon_t \\ & l_t = l_{t-1} + b_{0} \end{aligned} , \tag{4.15} The main difference of (4.15) with the Global Trend model (3.13) is that the latter explicitly includes the trend component $$t$$ and is formulated in one equation, while the former splits equation into two parts and has an explicit level component. They both fit the data in the same way and produce the same trajectories when $$a_0 = l_0$$ and $$a_1 = b_0$$ in the Global Trend model.

Due to its flexibility, the ETS(A,A,N) model is considered as one of the good benchmarks in the case of trended time series.

This is the model that underlies Damped trend method (Roberts, 1982, p. @Gardner1985a): \begin{aligned} & y_{t} = l_{t-1} + \phi b_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + \phi b_{t-1} + \alpha \epsilon_t \\ & b_t = \phi b_{t-1} + \beta \epsilon_t \end{aligned} , \tag{4.16} where $$\phi$$ is the dampening parameter, typically lying between 0 and 1. If it is equal to zero, the model reduces to ETS(A,N,N), (4.3). If it is equal to one, it becomes equivalent to ETS(A,A,N), (4.12). The dampening parameter slows down the trend, making it non-linear. An example of data that corresponds to ETS(A,Ad,N) is provided in Figure 4.9.

y <- sim.es("AAdN", 120, 1, 12, persistence=c(0.3,0.1),
initial=c(1000,20), phi=0.95, mean=0, sd=20)
plot(y)

Visually it is typically challenging to distinguish ETS(A,A,N) from ETS(A,Ad,N) data. So, some other model selection techniques are recommended (see Section 15.1).

The point forecast from this model is a bit more complicated (see Section 4.2): $$$\mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} + \sum_{j=1}^h \phi^j b_t. \tag{4.17}$$$ It corresponds to the slowing down trajectory, as shown in Figure 4.10.

As can be seen in Figure 4.10, the forecast trajectory from the ETS(A,Ad,N) has a slowing down element in it. This is because of the $$\phi=0.95$$ in our example.

### 4.4.3 ETS(A,A,M)

Finaly, this is an exotic model with additive error and trend, but multiplicative seasonality. Still, we list it here, because it underlies the Holt-Winters method (Winters, 1960): \begin{aligned} & y_{t} = (l_{t-1} + b_{t-1}) s_{t-m} + \epsilon_t \\ & l_t = l_{t-1} + b_{t-1} + \alpha \frac{\epsilon_t}{s_{t-m}} \\ & b_t = b_{t-1} + \beta \frac{\epsilon_t}{s_{t-m}} \\ & s_t = s_{t-m} + \gamma \frac{\epsilon_t}{l_{t-1}+b_{t-1}} \end{aligned} , \tag{4.18} where $$s_t$$ is the seasonal component and $$\gamma$$ is its smoothing parameter. This is one of the potentially unstable models, which due to the mix of components, might produce unreasonable forecasts because the seasonal component might become negative, while in the multiplicative model, it should always be positive. Still, it might work on the strictly positive high-level data. Figure 4.11 shows how the data for this model can look like.

y <- sim.es("AAM", 120, 1, 4, persistence=c(0.3,0.05,0.2),
initial=c(1000,20), initialSeason=c(0.9,1.1,0.8,1.2),
mean=0, sd=20)
plot(y)

The data in Figure 4.11 exhibits an additive trend with increasing seasonal amplitude, which are the two characteristics of the model.

Finally, the point forecast from this model build upon the ETS(A,A,N), introducing seasonal component: $$$\hat{y}_{t+h} = (l_{t} + h b_t) s_{t+h-m\lceil\frac{h}{m}\rceil}, \tag{4.19}$$$ where $$\lceil\frac{h}{m}\rceil$$ is the rounded up value of the fraction in the brackets. The point forecast from this model is shown in Figure 4.12.

Remark. The point forecasts produced from this model do not correspond to the conditional expectations. This will be discussed in Section 7.3.

Hyndman et al. (2008) argue that in ETS models, the error term should be aligned with the seasonal component because it is difficult to motivate why the amplitude of seasonality increases with the increase of level, while the variability of the error term stays the same. So, they recommend using ETS(M,A,M) instead of ETS(A,A,M) if you deal with positive high volume data. This is a reasonable recommendation, but keep in mind that both models might break if you deal with low volume data.

### References

• Gardner, E.S., McKenzie, E., 1989. Seasonal Exponential Smoothing with Damped Trends. Management Science. 35, 372–376. https://doi.org/10.1287/mnsc.35.3.372

• Holt, C.C., 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting. 20, 5–10. https://doi.org/10.1016/j.ijforecast.2003.09.015

• Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D., 2008. Forecasting with Exponential Smoothing.. Springer Berlin Heidelberg.

• Ord, J.K., Koehler, A.B., Snyder, R.D., 1997. Estimation and Prediction for a Class of Dynamic Nonlinear Statistical Models. Journal of the American Statistical Association. 92, 1621–1629. https://doi.org/10.1080/01621459.1997.10473684

• Pegels, C.C., 1969. Exponential Forecasting : Some New Variations. Management Science. 15, 311–315. https://www.jstor.org/stable/2628137

• Roberts, S.A., 1982. A General Class of Holt-Winters Type Forecasting Models. Management Science. 28, 808–820. https://doi.org/10.1287/mnsc.28.7.808

• Taylor, J.W., 2003. Exponential smoothing with a damped multiplicative trend. International Journal of Forecasting. 19, 715–725. https://doi.org/10.1016/S0169-2070(03)00003-7

• Winters, P.R., 1960. Forecasting Sales by Exponentially Weighted Moving Averages. Management Science. 6, 324–342. https://doi.org/10.1287/mnsc.6.3.324