## 4.2 SES and ETS

### 4.2.1 ETS(A,N,N)

There have been several tries to develop statistical models underlying SES, and we know now that the model has underlying ARIMA(0,1,1), local level MSOE (Multiple Source of Error, Muth, 1960) and SSOE (Single Source of Error, Snyder, 1985) models. According to Hyndman et al. (2002), the ETS(A,N,N) model also underlies the SES method. To see the connection and to get to it from SES, we need to recall two things: how in general, the actual value relates to the forecast error and the fitted value, and the error correction form of SES from Subsection 4.1.3: \[\begin{equation} \begin{aligned} & y_t = \hat{y}_{t} + e_t \\ & \hat{y}_{t+1} = \hat{y}_{t} + \hat{\alpha} e_{t} \end{aligned} . \tag{4.6} \end{equation}\] In order to get to the SSOE state space model for SES, we need to substitute \(\hat{y}_t=\hat{l}_{t-1}\), implying that the fitted value is equal to the level of the series: \[\begin{equation} \begin{aligned} & y_t = \hat{l}_{t-1} + e_t \\ & \hat{l}_{t} = \hat{l}_{t-1} + \hat{\alpha} e_{t} \end{aligned} . \tag{4.7} \end{equation}\] If we now substitute the sample estimates of level, smoothing parameter and forecast error by their population values, we will get the ETS(A,N,N), which was discussed in Section 3.5: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1} + \epsilon_t \\ & l_t = l_{t-1} + \alpha \epsilon_t \end{aligned} , \tag{4.7} \end{equation}\] where, as we know from Section 3.1, \(l_t\) is the level of the data, \(\epsilon_t\) is the error term, and \(\alpha\) is the smoothing parameter. Note that we use \(\alpha\) without the “hat” symbol, which implies that there is a “true” value of the parameter (which could be obtained if we had all the data in the world or just knew it for some reason). The main benefit of having the model (4.7) instead of just the method (4.5) is in having a flexible framework, which allows adding other components, selecting the most appropriate ones, consistently estimating parameters (see Section 4.3 of Svetunkov, 2021b), producing prediction intervals etc.

In order to see the data that corresponds to the ETS(A,N,N) we can use `sim.es()`

function from `smooth`

package. Here are several examples with different smoothing parameters values:

```
# list with generated data
<- vector("list",6)
y # Parameters for DGP
<- 1000
initial <- 0
meanValue <- 20
sdValue <- c(0.1,0.3,0.5,0.75,1,1.5)
alphas # Go through all alphas and generate respective data
for(i in 1:length(alphas)){
<- sim.es("ANN", 120, 1, 12, persistence=alphas[i],
y[[i]] initial=initial, mean=meanValue, sd=sdValue)
}
```

The generated data can be plotted the following way:

```
par(mfrow=c(3,2), mar=c(2,2,2,1))
for(i in 1:6){
plot(y[[i]], main=paste0("alpha=",y[[i]]$persistence),
ylim=initial+c(-500,500))
}
```

This simple simulation shows that the smoothing parameter in ETS(A,N,N) controls the variability in the data (Figure 4.7): the higher \(\alpha\) is, the higher variability is and less predictable the data becomes. With the higher values of \(\alpha\), the level changes faster, leading to increased uncertainty about the future values of the level in the data.

When it comes to the application of this model to the data, the conditional h steps ahead mean corresponds to the point forecast and is equal to the last observed level: \[\begin{equation} \mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} , \tag{4.8} \end{equation}\] this holds because it is assumed (see Section 1.4.1) that \(\mathrm{E}(\epsilon_t)=0\), which implies that the conditional h steps ahead expectation of the level in the model is (from the second equation in (4.7)): \[\begin{equation} \mathrm{E}(l_{t+h}|t) = l_t + \mathrm{E}(\alpha\sum_{j=1}^{h-1}\epsilon_{t+j}|t) = l_t . \tag{4.9} \end{equation}\]

Here is an example of a forecast from ETS(A,N,N) with automatic parameter estimation using `es()`

function from `smooth`

package:

```
# Generate the data
<- sim.es("ANN", 120, 1, 12, persistence=0.3, initial=1000)
y # Apply ETS(A,N,N) model
es(y$data, "ANN", h=12, interval=TRUE, holdout=TRUE, silent=FALSE)
```

```
## Time elapsed: 0.03 seconds
## Model estimated: ETS(ANN)
## Persistence vector g:
## alpha
## 0.2394
## Initial values were optimised.
##
## Loss function type: likelihood; Loss function value: 535.3814
## Error standard deviation: 34.8963
## Sample size: 108
## Number of estimated parameters: 3
## Number of degrees of freedom: 105
## Information criteria:
## AIC AICc BIC BICc
## 1076.763 1076.994 1084.809 1085.349
##
## 95% parametric prediction interval was constructed
## 92% of values are in the prediction interval
## Forecast errors:
## MPE: -1%; sCE: -11%; Asymmetry: -33.2%; MAPE: 2.4%
## MASE: 0.733; sMAE: 2.4%; sMSE: 0.1%; rMAE: 0.836; rRMSE: 0.956
```

As we see from Figure 4.8, the true smoothing parameter is 0.3, but the estimated one is not exactly 0.3, which is expected because we deal with an in-sample estimation. Also, notice that with such a smoothing parameter, the prediction interval widens with the increase of the forecast horizon. If the smoothing parameter were lower, the bounds would not increase, but this might not reflect the uncertainty about the level correctly. Here is an example with \(\alpha=0.01\) on the same data (Figure @ref(fig:ETSANNExamplealpha0.1))

```
<- es(y$data, "ANN", h=12, interval=TRUE,
ourModel holdout=TRUE, silent=FALSE, persistence=0.01)
```

Figure @ref(fig:ETSANNExamplealpha0.1) shows that the prediction interval does not expand, but at the same time is wider than needed, and the forecast is biased – the model does not keep up to the fast-changing time series. So, it is essential to correctly estimate the smoothing parameters not only to approximate the data but also to produce a less biased point forecast and a more appropriate prediction interval.

### 4.2.2 ETS(M,N,N)

Hyndman et al. (2008) demonstrate that there is another ETS model, underlying SES. It is the model with multiplicative error, which is formulated in the following way, as mentioned in Chapter 3.5: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1}(1 + \epsilon_t) \\ & l_t = l_{t-1}(1 + \alpha \epsilon_t) \end{aligned} , \tag{4.10} \end{equation}\] where \((1+\epsilon_t)\) corresponds to the \(\varepsilon_t\) discussed in Section 3.1. In order to see the connection of this model with SES, we need to revert to the estimation of the model on the data again: \[\begin{equation} \begin{aligned} & y_{t} = \hat{l}_{t-1}(1 + e_t) \\ & \hat{l}_t = \hat{l}_{t-1}(1 + \hat{\alpha} e_t) \end{aligned} . \tag{4.11} \end{equation}\] where one step ahead forecast is (Section 3.5) \(\hat{y}_t = \hat{l}_{t-1}\) and \(e_t=\frac{y_t -\hat{y}_t}{\hat{y}_t}\). Substituting these values in second equation of (4.11) we obtain: \[\begin{equation} \hat{y}_{t+1} = \hat{y}_t \left(1 + \hat{\alpha} \frac{y_t -\hat{y}_t}{\hat{y}_t} \right) \tag{4.12} \end{equation}\] Finally, opening the brackets, we get the SES in the form similar to (4.5): \[\begin{equation} \hat{y}_{t+1} = \hat{y}_t + \hat{\alpha} (y_t -\hat{y}_t). \tag{4.13} \end{equation}\]

This example again demonstrates the difference between a forecasting method and a model. When we use SES, we ignore the distributional assumptions, which restricts the usage of the method. When we use a model, we assume a specific structure, which on the one hand, makes it more restrictive, but on the other hand, gives it additional features. The main ones in the case of ETS(M,N,N) in comparison with ETS(A,N,N) are:

- The variance of the actual values in ETS(M,N,N) increases with the increase of the level \(l_{t}\). This allows modelling heteroscedasticity situation in the data;
- If \((1+\epsilon_t)\) is always positive, then the ETS(M,N,N) model will always produce only positive forecasts (both point and interval). This makes this model applicable in principle to the data with low levels.

An alternative to (4.10) would be the model (4.7) applied to the data in logarithms (assuming that the data we work with is always positive), implying that:
\[\begin{equation}
\begin{aligned}
& \log y_{t} = l_{t-1} + \epsilon_t \\
& l_t = l_{t-1} + \alpha \epsilon_t
\end{aligned} .
\tag{4.14}
\end{equation}\]
However, to produce forecasts from (4.14), exponentiation is needed, making the application of the model more difficult than needed. The ETS(M,N,N), on the other hand, does not rely on exponentiation, making it safe in cases when the model produces very high values (e.g. `exp(1000)`

returns infinity in R).

Finally, the conditional h steps ahead mean of ETS(M,N,N) corresponds to the point forecast and is equal to the last observed level, but only if \(\mathrm{E}(1+\epsilon_t)=1\): \[\begin{equation} \mu_{y,t+h|t} = \hat{y}_{t+h} = l_{t} . \tag{4.15} \end{equation}\]

And here is an example with the ETS(M,N,N) data (Figure 4.9):

```
<- sim.es("MNN", 120, 1, 12, persistence=0.3, initial=1000)
y <- es(y$data, "MNN", h=12, holdout=TRUE,
ourModel interval=TRUE, silent=FALSE)
```

` ourModel`

```
## Time elapsed: 0.02 seconds
## Model estimated: ETS(MNN)
## Persistence vector g:
## alpha
## 0.1835
## Initial values were optimised.
##
## Loss function type: likelihood; Loss function value: 658.9335
## Error standard deviation: 0.1027
## Sample size: 108
## Number of estimated parameters: 3
## Number of degrees of freedom: 105
## Information criteria:
## AIC AICc BIC BICc
## 1323.867 1324.098 1331.913 1332.454
##
## 95% parametric prediction interval was constructed
## 50% of values are in the prediction interval
## Forecast errors:
## MPE: 20.5%; sCE: 385.2%; Asymmetry: 100%; MAPE: 20.5%
## MASE: 3.22; sMAE: 32.1%; sMSE: 14.2%; rMAE: 1.809; rRMSE: 1.447
```

Conceptually, the data in Figure 4.9 looks very similar to the one from ETS(A,N,N) (Figure 4.8), but demonstrating the changing variance of the error term with the change of the level.