This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

## 4.1 Simple Exponential Smoothing

We start our discussion of exponential smoothing with the original Simple Exponential Smoothing (SES) forecasting method, which was formulated by : $\begin{equation} \hat{y}_{t+1} = \hat{\alpha} {y}_{t} + (1 - \hat{\alpha}) \hat{y}_{t}, \tag{4.1} \end{equation}$ where $$\hat{\alpha}$$ is the smoothing parameter, defined by analyst and which is typically restricted with (0, 1) region (this region is actually arbitrary and we will see in Section 4.6 what is the correct one). This is one of the simplest forecasting methods, and the smoothing parameter in it is typically interpreted as a weight between the actual value and the one-step-ahead predicted one. If the smoothing parameter is close to zero, then more weight is given to the previous fitted value $$\hat{y}_{t}$$ and the new information is neglected. If $$\hat{\alpha}=0$$ then the method becomes equivalent to Global Mean method, discussed in Section 3.3.2. When it is close to one, then mainly the actual value $${y}_{t}$$ is taken into account. If $$\hat{\alpha}=1$$, then the method transforms into Naïve, discussed in Section 3.3.1. By changing the smoothing parameter value, the forecaster can decide how to approximate the data and filter out the noise.

Also, notice that this is a recursive method, meaning that there needs to be some starting point $$\hat{y}_1$$ in order to apply (4.1) to the existing data. Different initialisation and estimation methods for SES have been discussed in the literature, but the state of the art one is to estimate $$\hat{\alpha}$$ and $$\hat{y}_{1}$$ together by minimising some loss function . Typically MSE (see Section 2.1) is used as one, minimising the squares of one step ahead forecast errors.

### 4.1.1 Examples of application

Here is an example of how this method works on different time series. We start with generating a stationary series and using es() function from smooth package. Although it implements the ETS model, we will see later the connection between SES and ETS(A,N,N). We start with the stationary time series and $$\hat{\alpha}=0$$:

y <- rnorm(100,100,10)
ourModel <- es(y, model="ANN", h=10, persistence=0)
plot(ourModel, 7, main="") Figure 4.1: An example with a time series and SES forecast. $$\hat{\alpha}=0$$

As we see from Figure 4.1, the SES works well in this case, capturing the deterministic level of the series and filtering out the noise. In this case, it works like a global average applied to the data. As mentioned before, the method is flexible, so if we have a level shift in the data and increase the smoothing parameter, it will adapt and get to the new level. Figure 4.2 shows an example with level shift in the data.

y <- c(rnorm(50,100,10),rnorm(50,130,10))
ourModel <- es(y, model="ANN", h=10, persistence=0.1)
plot(ourModel, 7, main="") Figure 4.2: An example with a time series and SES forecast. $$\hat{\alpha}=0.1$$

With $$\hat{\alpha}=0.1$$, SES manages to get to the new level, but now the method starts adapting to noise a little bit - it follows the peaks and troughs and repeats them with a lag, but with much smaller magnitude (see Figure 4.2). If we increase the smoothing parameter, it will react to the changes much faster, but it will also react more to noise. This is shown in Figure 4.3 with different smoothing parameters values. Figure 4.3: SES with different smoothing parameters applied to the same data.

If we set $$\hat{\alpha}=1$$, we will end up with Naive forecasting method (see Section 3.3.1), which is not appropriate for our example (see Figure 4.4). Figure 4.4: SES with $$\hat{\alpha}=1$$.

So, when working with SES, we need to make sure that the reasonable smoothing parameter is selected. This can be done automatically via minimising the MSE (see Figure 4.5):

ourModel <- es(y, model="ANN", h=10, loss="MSE")
plot(ourModel, 7, main=paste0("SES with alpha=",
round(ourModel\$persistence,3))) Figure 4.5: SES with optimal smoothing parameter.

This approach won’t guarantee that we will get the most appropriate $$\hat{\alpha}$$, but it has been shown in the literature that the optimisation of smoothing parameter on average leads to improvements in terms of forecasting accuracy (see, for example, Fildes, 1992).

### 4.1.2 Why “exponential?”

Now, why is it called “exponential”? Because the same method can be represented in a different form, if we substitute $$\hat{y}_{t}$$ in right hand side of (4.1) by the formula for the previous step: \begin{equation} \begin{aligned} \hat{y}_{t} = &\hat{\alpha} {y}_{t-1} + (1 - \hat{\alpha}) \hat{y}_{t-1}, \\ \hat{y}_{t+1} = &\hat{\alpha} {y}_{t} + (1 - \hat{\alpha}) \hat{y}_{t} = \\ & \hat{\alpha} {y}_{t} + (1 -\hat{\alpha}) \left( \hat{\alpha} {y}_{t-1} + (1 -\hat{\alpha}) \hat{y}_{t-1} \right). \end{aligned} \tag{4.2} \end{equation} By repeating this procedure for each $$\hat{y}_{t-1}$$, $$\hat{y}_{t-2}$$ etc, we will obtain a different form of the method: $\begin{equation} \hat{y}_{t+1} = \hat{\alpha} {y}_{t} + \hat{\alpha} (1 -\hat{\alpha}) {y}_{t-1} + \hat{\alpha} (1 -\hat{\alpha})^2 {y}_{t-2} + \dots + \hat{\alpha} (1 -\hat{\alpha})^{t-1} {y}_{1} + (1 -\hat{\alpha})^t \hat{y}_1 \tag{4.3} \end{equation}$ or equivalently: $\begin{equation} \hat{y}_{t+1} = \hat{\alpha} \sum_{j=0}^{t-1} (1 -\hat{\alpha})^j {y}_{t-j} + (1 -\hat{\alpha})^t \hat{y}_1 . \tag{4.4} \end{equation}$ In the form (4.4), each actual observation has a weight infront of it. For the most recent observation it is equal to $$\hat{\alpha}$$, for the previous one it is $$\hat{\alpha} (1 -\hat{\alpha})$$, then $$\hat{\alpha} (1 -\hat{\alpha})^2$$ etc. These form the geometric series or an exponential curve. Figure 4.6 shows an example with $$\hat{\alpha} =0.25$$ for a sample of 30 observations:

plot(0.25*(1-0.25)^c(0:30), type="b",
xlab="Time lags", ylab="Weights") Figure 4.6: Example of weights distribution for $$\hat{\alpha}=0.25$$

This explains the name “exponential.” The term “smoothing” comes from the idea that the parameter $$\hat{\alpha}$$ should be selected so that the method smooths the original time series and not to react to noise.

### 4.1.3 Error correction form of SES

Finally, an alternative form of SES is known as error correction form and involves some simple permutations, taking that $$e_t=y_t-\hat{y}_t$$ is the one step ahead forecast error, formula (4.1) can be written as: $\begin{equation} \hat{y}_{t+1} = \hat{y}_{t} + \hat{\alpha} e_{t}. \tag{4.5} \end{equation}$ In this form, the smoothing parameter $$\hat{\alpha}$$ has a different meaning: it regulates how much the model reacts to the forecast error. In this interpretation it no longer needs to be restricted with (0, 1) region, but we would still typically want it to be closer to zero, in order to filter out the noise, not to adapt to it.

As you see, SES is a very simple method. It is easy to explain to practitioners and it is very easy to implement in practice. However, this is just a forecasting method (see Section 1.4), so it just provides a way of generating point forecasts, but does not explain where the error comes from and how to generate prediction intervals.

### References

• Brown, R.G., 1956. Exponential Smoothing for predicting demand.
• Fildes, R., 1992. The evaluation of extrapolative forecasting methods. International Journal of Forecasting. 8, 81–98. https://doi.org/10.1016/0169-2070(92)90009-X
• Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting. 18, 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8