4.3 ETS and SES

4.3.1 ETS(A,N,N)

There have been several tries to develop statistical models underlying SES, and we know now that the model has underlying ARIMA(0,1,1) (Muth, 1960), local level MSOE (Multiple Source of Error, Muth, 1960) and SSOE (Single Source of Error, Snyder, 1985) models. According to Hyndman et al. (2002), the ETS(A,N,N) model also underlies the SES method. To see the connection and to get to it from SES, we need to recall two things: how in general, the actual value relates to the forecast error and the fitted value, and the error correction form of SES from Subsection 3.4.3: \[\begin{equation} \begin{aligned} & y_t = \hat{y}_{t} + e_t \\ & \hat{y}_{t+1} = \hat{y}_{t} + \hat{\alpha} e_{t} \end{aligned} . \tag{4.1} \end{equation}\] In order to get to the SSOE state space model for SES, we need to substitute \(\hat{y}_t=\hat{l}_{t-1}\), implying that the fitted value is equal to the level of the series: \[\begin{equation} \begin{aligned} & y_t = \hat{l}_{t-1} + e_t \\ & \hat{l}_{t} = \hat{l}_{t-1} + \hat{\alpha} e_{t} \end{aligned} . \tag{4.2} \end{equation}\] If we now substitute the sample estimates of level, smoothing parameter and forecast error by their population values, we will get the ETS(A,N,N), which was discussed in Section 4.2: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + \alpha \epsilon_t \end{aligned} , \tag{4.3} \end{equation}\] where, as we know from Section 3.1, \(l_t\) is the level of the data, \(\epsilon_t\) is the error term, and \(\alpha\) is the smoothing parameter. Note that we use \(\alpha\) without the “hat” symbol, which implies that there is a “true” value of the parameter (which could be obtained if we had all the data in the world or just knew it for some reason). The main benefit of having the model (4.3) instead of just the method (3.21) is in having a flexible framework, which allows adding other components, selecting the most appropriate ones, consistently estimating parameters (see Section 4.3 of Svetunkov, 2022a), producing prediction intervals etc.

In order to see the data that corresponds to the ETS(A,N,N) we can use sim.es() function from smooth package. Here are several examples with different smoothing parameters values:

# list with generated data
y <- vector("list",6)
# Parameters for DGP
initial <- 1000
meanValue <- 0
sdValue <- 20
alphas <- c(0.1,0.3,0.5,0.75,1,1.5)
# Go through all alphas and generate respective data
for(i in 1:length(alphas)){
  y[[i]] <- sim.es("ANN", 120, 1, 12, persistence=alphas[i],
                   initial=initial, mean=meanValue, sd=sdValue)
}

The generated data can be plotted the following way:

par(mfrow=c(3,2), mar=c(2,2,2,1))
for(i in 1:6){
  plot(y[[i]], main=paste0("alpha=",y[[i]]$persistence),
       ylim=initial+c(-500,500))
}
Local level data corresponding to ETS(A,N,N) model with different smoothing parameters.

Figure 4.3: Local level data corresponding to ETS(A,N,N) model with different smoothing parameters.

This simple simulation shows that the smoothing parameter in ETS(A,N,N) controls the variability in the data (Figure 4.3): the higher \(\alpha\) is, the higher variability is and less predictable the data becomes. With the higher values of \(\alpha\), the level changes faster, leading to increased uncertainty about the future values of the level in the data.

When it comes to the application of this model to the data, the conditional h steps ahead mean corresponds to the point forecast and is equal to the last observed level: \[\begin{equation} \mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} , \tag{4.4} \end{equation}\] this holds because it is assumed (see Section 1.4.1) that \(\mathrm{E}(\epsilon_t)=0\), which implies that the conditional h steps ahead expectation of the level in the model is (from the second equation in (4.3)): \[\begin{equation} \mathrm{E}(l_{t+h}|t) = l_t + \mathrm{E}(\alpha\sum_{j=1}^{h-1}\epsilon_{t+j}|t) = l_t . \tag{4.5} \end{equation}\]

Here is an example of a forecast from ETS(A,N,N) with automatic parameter estimation using es() function from smooth package:

# Generate the data
y <- sim.es("ANN", 120, 1, 12, persistence=0.3, initial=1000)
# Apply ETS(A,N,N) model
es(y$data, "ANN", h=12, holdout=TRUE, silent=FALSE)
An example of ETS(A,N,N) applied to the data generated from the same model.

Figure 4.4: An example of ETS(A,N,N) applied to the data generated from the same model.

## Time elapsed: 0.01 seconds
## Model estimated using es() function: ETS(ANN)
## Distribution assumed in the model: Normal
## Loss function type: likelihood; Loss function value: 517.3871
## Persistence vector g:
##  alpha 
## 0.3425 
## 
## Sample size: 108
## Number of estimated parameters: 3
## Number of degrees of freedom: 105
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 1040.774 1041.005 1048.821 1049.361 
## 
## Forecast errors:
## ME: 15.678; MAE: 28.663; RMSE: 30.377
## sCE: 19.103%; Asymmetry: 60.9%; sMAE: 2.91%; sMSE: 0.095%
## MASE: 1.023; RMSSE: 0.856; rMAE: 1.407; rRMSE: 1.155

As we see from Figure 4.4, the true smoothing parameter is 0.3, but the estimated one is not exactly 0.3, which is expected because we deal with an in-sample estimation. Also, notice that with such a smoothing parameter, the prediction interval widens with the increase of the forecast horizon. If the smoothing parameter were lower, the bounds would not increase, but this might not reflect the uncertainty about the level correctly. Here is an example with \(\alpha=0.01\) on the same data (Figure 4.5)

ourModel <- es(y$data, "ANN", h=12,
               holdout=TRUE, silent=FALSE, persistence=0.01)
ETS(A,N,N) with $\hat{\alpha}=0.01$ applied to the data generated from the same model with $\alpha=0.3$.

Figure 4.5: ETS(A,N,N) with \(\hat{\alpha}=0.01\) applied to the data generated from the same model with \(\alpha=0.3\).

Figure 4.5 shows that the prediction interval does not expand, but at the same time is wider than needed, and the forecast is biased – the model does not keep up to the fast-changing time series. So, it is essential to correctly estimate the smoothing parameters not only to approximate the data but also to produce a less biased point forecast and a more appropriate prediction interval.

4.3.2 ETS(M,N,N)

Hyndman et al. (2008) demonstrate that there is another ETS model, underlying SES. It is the model with multiplicative error, which is formulated in the following way, as mentioned in Chapter 4.2: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1}(1 + \epsilon_t) \\ & l_t = l_{t-1}(1 + \alpha \epsilon_t) \end{aligned} , \tag{4.6} \end{equation}\] where \((1+\epsilon_t)\) corresponds to the \(\varepsilon_t\) discussed in Section 3.1. In order to see the connection of this model with SES, we need to revert to the estimation of the model on the data again: \[\begin{equation} \begin{aligned} & y_{t} = \hat{l}_{t-1}(1 + e_t) \\ & \hat{l}_t = \hat{l}_{t-1}(1 + \hat{\alpha} e_t) \end{aligned} . \tag{4.7} \end{equation}\] where one step ahead forecast is (Section 4.2) \(\hat{y}_t = \hat{l}_{t-1}\) and \(e_t=\frac{y_t -\hat{y}_t}{\hat{y}_t}\). Substituting these values in second equation of (4.7) we obtain: \[\begin{equation} \hat{y}_{t+1} = \hat{y}_t \left(1 + \hat{\alpha} \frac{y_t -\hat{y}_t}{\hat{y}_t} \right) \tag{4.8} \end{equation}\] Finally, opening the brackets, we get the SES in the form similar to (3.21): \[\begin{equation} \hat{y}_{t+1} = \hat{y}_t + \hat{\alpha} (y_t -\hat{y}_t). \tag{4.9} \end{equation}\]

This example again demonstrates the difference between a forecasting method and a model. When we use SES, we ignore the distributional assumptions, which restricts the usage of the method. When we use a model, we assume a specific structure, which on the one hand, makes it more restrictive, but on the other hand, gives it additional features. The main ones in the case of ETS(M,N,N) in comparison with ETS(A,N,N) are:

  1. The variance of the actual values in ETS(M,N,N) increases with the increase of the level \(l_{t}\). This allows modelling heteroscedasticity situation in the data;
  2. If \((1+\epsilon_t)\) is always positive, then the ETS(M,N,N) model will always produce only positive forecasts (both point and interval). This makes this model applicable in principle to the data with low levels.

An alternative to (4.6) would be the model (4.3) applied to the data in logarithms (assuming that the data we work with is always positive), implying that: \[\begin{equation} \begin{aligned} & \log y_{t} = l_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + \alpha \epsilon_t \end{aligned} . \tag{4.10} \end{equation}\] However, to produce forecasts from (4.10), exponentiation is needed, making the application of the model more difficult than needed. The ETS(M,N,N), on the other hand, does not rely on exponentiation, making it safe in cases when the model produces very high values (e.g. exp(1000) returns infinity in R).

Finally, the conditional h steps ahead mean of ETS(M,N,N) corresponds to the point forecast and is equal to the last observed level, but only if \(\mathrm{E}(1+\epsilon_t)=1\): \[\begin{equation} \mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} . \tag{4.11} \end{equation}\]

And here is an example with the ETS(M,N,N) data (Figure 4.6):

y <- sim.es("MNN", 120, 1, 12, persistence=0.3, initial=1000)
ourModel <- es(y$data, "MNN", h=12, holdout=TRUE,
               silent=FALSE)
ETS(M,N,N) model applied to the data generated from the same model.

Figure 4.6: ETS(M,N,N) model applied to the data generated from the same model.

ourModel
## Time elapsed: 0.01 seconds
## Model estimated using es() function: ETS(MNN)
## Distribution assumed in the model: Normal
## Loss function type: likelihood; Loss function value: 695.7947
## Persistence vector g:
##  alpha 
## 0.4222 
## 
## Sample size: 108
## Number of estimated parameters: 3
## Number of degrees of freedom: 105
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 1397.589 1397.820 1405.636 1406.176 
## 
## Forecast errors:
## ME: 103.68; MAE: 172.808; RMSE: 208.158
## sCE: 82.601%; Asymmetry: 67.7%; sMAE: 11.473%; sMSE: 1.91%
## MASE: 1.157; RMSSE: 1.094; rMAE: 0.834; rRMSE: 0.87

Conceptually, the data in Figure 4.6 looks very similar to the one from ETS(A,N,N) (Figure 4.4), but demonstrating the changing variance of the error term with the change of the level.

References

• Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D., 2008. Forecasting with Exponential Smoothing. Springer Berlin Heidelberg.
• Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting. 18, 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8
• Muth, J.F., 1960. Optimal Properties of Exponentially Weighted Forecasts. Journal of the American Statistical Association. 55, 299–306. https://doi.org/10.2307/2281742
• Snyder, R.D., 1985. Recursive Estimation of Dynamic Linear Models. Journal of the Royal Statistical Society, Series B (Methodological). 47, 272–276. https://doi.org/10.1111/j.2517-6161.1985.tb01355.x
• Svetunkov, I., 2022a. Statistics for business analytics. https://openforecast.org/sba/ (version: 31.03.2022)