This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

4.2 SES and ETS

4.2.1 ETS(A,N,N)

There have been several tries to develop statistical models, underlying SES, and we know now that it has underlying ARIMA(0,1,1), local level MSOE (Multiple Source of Error, Muth, 1960) and SSOE (Single Source of Error, Snyder, 1985) models. According to (Hyndman et al., 2002), the ETS(A,N,N) model also underlies the SES method. It can be formulated in the following way, as discussed in Section 3.5: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + \alpha \epsilon_t \end{aligned} , \tag{4.6} \end{equation}\] where, as we know from Section 3.1, \(l_t\) is the level of the data, \(\epsilon_t\) is the error term and \(\alpha\) is the smoothing parameter. Note that we use \(\alpha\) without the “hat” symbol, which implies that there is a “true” value of the parameter (which could be obtained if we had all the data in the world or just knew it for some reason). It is easy to show that ETS(A,N,N) underlies SES. In order to see this, we need to move towards estimation phase and use \(\hat{l}_{t-1}=l_{t-1}\) and \(\hat{\alpha}\) instead of \(\alpha\) and \(e_t\) as estimate of \(\epsilon_t\): \[\begin{equation} \begin{aligned} & y_{t} = \hat{l}_{t-1} + e_t \\ & \hat{l}_t = \hat{l}_{t-1} + \hat{\alpha} e_t \end{aligned} , \tag{4.7} \end{equation}\] Inserting the second equation in the first one, we get: \[\begin{equation} y_{t} = \hat{l}_{t-2} + \hat{\alpha} e_{t-1} + e_t . \tag{4.8} \end{equation}\] The one step ahead forecast from ETS(A,N,N) is \(\hat{y}_t=\hat{l}_{t-1}\) (see Section 3.5), while the actual value can be represented as \(y_t = \hat{y}_t + e_t\), leading to: \[\begin{equation} \hat{y}_t + e_t = \hat{y}_{t-1} + \hat{\alpha} e_{t-1} + e_t . \tag{4.9} \end{equation}\] Cancelling out \(e_t\) and shifting everything by one step ahead, we obtain the error correction form (4.5) of SES. But now, the main benefit of having the model (4.6) instead of just the method (4.5) is in having a flexible framework, which allows adding other components, selecting the most appropriate ones, estimating parameters in a consistent way (see Section 4.3 of Svetunkov, 2021c), producing prediction intervals etc.

In order to see the data that corresponds to the ETS(A,N,N) we can use sim.es() function from smooth package. Here are several examples with different smoothing parameters values:

# list with generated data
y <- vector("list",6)
# Parameters for DGP
initial <- 1000
meanValue <- 0
sdValue <- 20
alphas <- c(0.1,0.3,0.5,0.75,1,1.5)
# Go through all alphas and generate respective data
for(i in 1:length(alphas)){
  y[[i]] <- sim.es("ANN", 120, 1, 12, persistence=alphas[i],
                   initial=initial, mean=meanValue, sd=sdValue)
}

The generated data can be plotted the following way:

par(mfrow=c(3,2), mar=c(2,2,2,1))
for(i in 1:6){
  plot(y[[i]], main=paste0("alpha=",y[[i]]$persistence),
       ylim=initial+c(-500,500))
}
Local level data corresponding to ETS(A,N,N) model with different smoothing parameters.

Figure 4.7: Local level data corresponding to ETS(A,N,N) model with different smoothing parameters.

This simple simulation shows that the smoothing parameter in ETS(A,N,N) controls the variability in the data (Figure 4.7): the higher \(\alpha\) is, the higher variability is and less predictable the data becomes. With the higher values of \(\alpha\), the level changes faster, also leading to the increased uncertainty about the future values of the level in the data.

When it comes to the application of this model to the data, the conditional h steps ahead mean corresponds to the point forecast and is equal to the last observed level: \[\begin{equation} \mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} , \tag{4.10} \end{equation}\] this holds because it is assumed (see Section 1.4.1) that \(\text{E}(\epsilon_t)=0\), which implies that the conditional h steps ahead expectation of the level in the model is (from the second equation in (4.6)): \[\begin{equation} \text{E}(l_{t+h}|t) = l_t + \text{E}(\alpha\sum_{j=1}^{h-1}\epsilon_{t+j}|t) = l_t . \tag{4.11} \end{equation}\]

Here is an example of a forecast from ETS(A,N,N) with automatic parameter estimation using es() function from smooth package:

# Generate the data
y <- sim.es("ANN", 120, 1, 12, persistence=0.3, initial=1000)
# Apply ETS(A,N,N) model
es(y$data, "ANN", h=12, interval=TRUE, holdout=TRUE, silent=FALSE)
An example of ETS(A,N,N) applied to the data generated from the same model.

Figure 4.8: An example of ETS(A,N,N) applied to the data generated from the same model.

## Time elapsed: 0.04 seconds
## Model estimated: ETS(ANN)
## Persistence vector g:
##  alpha 
## 0.3955 
## Initial values were optimised.
## 
## Loss function type: likelihood; Loss function value: 506.2266
## Error standard deviation: 26.6404
## Sample size: 108
## Number of estimated parameters: 3
## Number of degrees of freedom: 105
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 1018.453 1018.684 1026.500 1027.040 
## 
## 95% parametric prediction interval was constructed
## 92% of values are in the prediction interval
## Forecast errors:
## MPE: 3.9%; sCE: 49.5%; Asymmetry: 100%; MAPE: 3.9%
## MASE: 1.957; sMAE: 4.1%; sMSE: 0.2%; rMAE: 0.515; rRMSE: 0.58

As we see from Figure 4.8, the true smoothing parameter is 0.3, but the estimated one is not exactly 0.3, which is expected, because we deal with an in-sample estimation. Also, notice that with such a smoothing parameter, the prediction interval is widening with the increase of the forecast horizon. If the smoothing parameter was lower, then the bounds would not increase, but this might not reflect the uncertainty about the level correctly. Here is an example with \(\alpha=0.01\) on the same data (Figure @ref(fig:ETSANNExamplealpha0.1))

ourModel <- es(y$data, "ANN", h=12, interval=TRUE,
               holdout=TRUE, silent=FALSE, persistence=0.01)
ETS(A,N,N) with $\hat{\alpha}=0.01$ applied to the data generated from the same model with $\alpha=0.3$.

(#fig:ETSANNExamplealpha0.1)ETS(A,N,N) with \(\hat{\alpha}=0.01\) applied to the data generated from the same model with \(\alpha=0.3\).

Figure @ref(fig:ETSANNExamplealpha0.1) shows that the prediction interval does not expand, but at the same time is wider than needed, and the forecast is biased - the model does not keep up to the fast changing time series. So, it is important to correctly estimate the smoothing parameters not only to approximate the data, but also to produce less biased point forecast and more appropriate prediction interval.

4.2.2 ETS(M,N,N)

Hyndman et al. (2008) demonstrate that there is another ETS model, underlying SES. It is the model with multiplicative error, which is formulated in the following way, as mentioned in Chapter 3.5: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1}(1 + \epsilon_t) \\ & l_t = l_{t-1}(1 + \alpha \epsilon_t) \end{aligned} , \tag{4.12} \end{equation}\] where \((1+\epsilon_t)\) corresponds to the \(\varepsilon_t\) discussed in Section 3.1. In order to see the connection of this model with SES, we need to revert to the estimation of the model on the data again: \[\begin{equation} \begin{aligned} & y_{t} = \hat{l}_{t-1}(1 + e_t) \\ & \hat{l}_t = \hat{l}_{t-1}(1 + \hat{\alpha} e_t) \end{aligned} . \tag{4.13} \end{equation}\] where one step ahead forecast is (Section 3.5) \(\hat{y}_t = \hat{l}_{t-1}\) and \(e_t=\frac{y_t - \hat{y}_t}{\hat{y}_t}\). Substituting these values in second equation of (4.13) we obtain: \[\begin{equation} \hat{y}_{t+1} = \hat{y}_t \left(1 + \hat{\alpha} \frac{y_t - \hat{y}_t}{\hat{y}_t} \right) \tag{4.14} \end{equation}\] Finally, opening the brackets, we get the SES in the form similar to (4.5): \[\begin{equation} \hat{y}_{t+1} = \hat{y}_t + \hat{\alpha} (y_t - \hat{y}_t). \tag{4.15} \end{equation}\]

This example demonstrates once again the difference between a forecasting method and a model. When we use SES, we ignore the distributional assumptions, which restricts the usage of the method. When we use a model, we assume a specific structure, which on one hand makes it more restrictive, but on the other hand gives it additional features. The main ones in case of ETS(M,N,N) in comparison with ETS(A,N,N) are:

  1. The variance of the actual values in ETS(M,N,N) increases with the increase of the level \(l_{t}\). This allows modelling heteroscedasticity situation in the data;
  2. If \((1+\epsilon_t)\) is always positive, then the ETS(M,N,N) model will always produce only positive forecasts (both point and interval). This makes this model applicable in principle to the data with low level.

An alternative to (4.12) would be the model (4.6) applied to the data in logarithms (assuming that the data we work with is always positive), implying that: \[\begin{equation} \begin{aligned} & \log y_{t} = l_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + \alpha \epsilon_t \end{aligned} . \tag{4.16} \end{equation}\] However, in order to produce forecasts from (4.16) exponentiation is needed, which might make the application of the model more difficult than needed. The ETS(M,N,N) on the other hand does not rely on exponentiation, making it safe in cases, when very high values are produced by the model (e.g. exp(1000) returns infinity in R).

Finally, the conditional h steps ahead mean of ETS(M,N,N) corresponds to the point forecast and is equal to the last observed level, but only if \(\text{E}(1+\epsilon_t)=1\): \[\begin{equation} \mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} . \tag{4.17} \end{equation}\]

And here is an example with the ETS(M,N,N) data (Figure 4.9):

y <- sim.es("MNN", 120, 1, 12, persistence=0.3, initial=1000)
ourModel <- es(y$data, "MNN", h=12, holdout=TRUE,
               interval=TRUE, silent=FALSE)
ETS(M,N,N) model applied to the data generated from the same model.

Figure 4.9: ETS(M,N,N) model applied to the data generated from the same model.

ourModel
## Time elapsed: 0.04 seconds
## Model estimated: ETS(MNN)
## Persistence vector g:
##  alpha 
## 0.2003 
## Initial values were optimised.
## 
## Loss function type: likelihood; Loss function value: 638.9122
## Error standard deviation: 0.092
## Sample size: 108
## Number of estimated parameters: 3
## Number of degrees of freedom: 105
## Information criteria:
##      AIC     AICc      BIC     BICc 
## 1283.824 1284.055 1291.871 1292.411 
## 
## 95% parametric prediction interval was constructed
## 75% of values are in the prediction interval
## Forecast errors:
## MPE: -12.3%; sCE: -107.4%; Asymmetry: -86.6%; MAPE: 13.3%
## MASE: 1.146; sMAE: 9.9%; sMSE: 1.5%; rMAE: 1.16; rRMSE: 1.131

Conceptually, the data in Figure 4.9 looks very similar to the one from ETS(A,N,N) (Figure 4.8), but demonstrating the changing variance of the error term with the change of the level.

References

• Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D., 2008. Forecasting with Exponential Smoothing. Springer Berlin Heidelberg.
• Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting. 18, 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8
• Muth, J.F., 1960. Optimal Properties of Exponentially Weighted Forecasts. Journal of the American Statistical Association. 55, 299–306. https://doi.org/10.2307/2281742
• Snyder, R.D., 1985. Recursive Estimation of Dynamic Linear Models. Journal of the Royal Statistical Society, Series B (Methodological). 47, 272–276. https://doi.org/10.1111/j.2517-6161.1985.tb01355.x
• Svetunkov, I., 2021c. Statistics for business analytics. https://openforecast.org/sba/ (version: [01.09.2021])