## 3.3 Simple forecasting methods

Now that we understand that time series might contain different components and that there are approaches for their decomposition, we can introduce several simple forecasting methods that can be used in practice, at least as benchmarks. Their usage aligns with the idea of forecasting principles discussed in Section 1.2.

### 3.3.1 Naïve

Naïve is one of the simplest forecasting methods. According to each, the one-step-ahead forecast is equal to the most recent actual value: $$$\hat{y}_t = y_{t-1} . \tag{3.6}$$$ Using this approach might sound naïve indeed, but there are cases where it is very hard to outperform the method. Consider an example with temperature forecasting. If we want to know what the temperature outside will be in 5 minutes, then Naïve would be typically very accurate: the temperature in 5 minutes will be the same as it is right now. The statistical model underlying Naïve is called “Random Walk” and is written as: $$$y_t = y_{t-1} + \epsilon_t. \tag{3.7}$$$ The variability of $$\epsilon_t$$ will impact the speed of change of the data: the higher it is, the more rapid the values will change. Random Walk and Naïve can be represented in Figure 3.5. In the example below, we use a simple moving average (discussed later in Section 3.3.3) of order 1 to generate the data from Random Walk and then produce forecasts using Naïve.

y <- sim.sma(1, 120)
testModel <- sma(y$data, 1, h=10, holdout=TRUE) plot(testModel, 7, main="") As shown from the plot in Figure 3.5, Naïve lags behind the actual time series by one observation because of how it is constructed via equation (3.6). The point forecast corresponds to the straight line parallel to the x-axis. Given that the data was generated from Random Walk, the point forecast shown in Figure 3.5 is the best possible forecast for the time series, even though it exhibits rapid changes in the level. Note that if the time series exhibits level shifts or other types of unexpected changes in dynamics, Naïve will update rapidly and reach the new level instantaneously. However, because it only has a memory of one (last) observation, it will not filter out the noise in the data but rather copy it into the future. So, it has limited usefulness in practice. However, being the simplest possible forecasting method, it is considered one of the basic forecasting benchmarks. If your model cannot beat it, it is not worth using. ### 3.3.2 Global Mean While Naïve considered only one observation (the most recent one), global mean (aka “global average”) relies on all the observations in the data: $$$\hat{y}_t = \bar{y} = \frac{1}{T} \sum_{t=1}^T y_{t} , \tag{3.8}$$$ where $$T$$ is the sample size. The model underlying this forecasting method is called “global level” and is written as: $$$y_t = \mu + \epsilon_t, \tag{3.9}$$$ so that the $$\bar{y}$$ is an estimate of the fixed expectation $$\mu$$. Graphically, this is represented with a straight line going through the series as shown in Figure 3.6. y <- rnorm(120, 100, 10) testModel <- es(y, "ANN", persistence=0, h=10, holdout=TRUE) plot(testModel, 7, main="") The series shown in Figure 3.6 is generated from the global level model, and the point forecast corresponds to the forecast from the global mean method. Note that the method assumes that the weights between the in-sample observation are equal, i.e. the first observation has precisely the exact weight of $$\frac{1}{T}$$ as the last one. Suppose the series exhibits some changes in level over time. In that case, the global mean will not be suitable because it will produce the averaged out forecast, considering values for parts before and after the change. ### 3.3.3 Simple Moving Average Naïve and Global Mean can be considered as opposite points in the spectrum of possible level time series (although there are series beyond Naïve, see for example ARIMA(0,1,1) with $$\theta_1>0$$, discussed in Section 8). The series between them exhibit slow changes in level and can be modelled using different forecasting approaches. One of those is the Simple Moving Average (SMA), which uses the mechanism of the mean for a small part of the time series. It relies on the formula: $$$\hat{y}_t = \frac{1}{m}\sum_{j=1}^{m} y_{t-j}, \tag{3.10}$$$ which implies going through time series with something like a “window” of $$m$$ observations and using their average for forecasting. The order $$m$$ determines the length of the memory of the method: if it is equal to 1, then (3.10) turns into Naïve, while in the case of $$m=T$$ it transforms into Global Mean. The order $$m$$ is typically decided by a forecaster, keeping in mind that the lower $$m$$ corresponds to the shorter memory method, while the higher one corresponds to the longer one. have shown that SMA has an underlying non-stationary AR(m) model with $$\phi_j=\frac{1}{m}$$ for all $$j=1, \dots, m$$. While the conventional approach to forecasting from SMA is to produce the straight line, equal to the last obtained observation, demonstrate that, in general, the point forecast does not correspond to the straight line. y <- sim.sma(10,120) par(mfcol=c(2,2), mar=c(2,2,2,1)) # SMA(1) testModel <- sma(y, order=1, h=10, holdout=TRUE) plot(testModel, 7, main=testModel$model)
# SMA(10)
testModel <- sma(y, order=10,
h=10, holdout=TRUE)
plot(testModel, 7, main=testModel$model) # SMA(20) testModel <- sma(y, order=20, h=10, holdout=TRUE) plot(testModel, 7, main=testModel$model)
# SMA(110)
testModel <- sma(y, order=110,
h=10, holdout=TRUE)
plot(testModel, 7, main=testModel\$model)

Figure 3.7 demonstrates the time series generated from SMA(10) and several SMA models applied to it. We can see that the higher orders of SMA lead to smoother fitted lines and calmer point forecasts. On the other hand, the SMA of a very high order, such as SMA(110), does not follow the changes in time series efficiently, ignoring the potential changes in level. Given the difficulty with selecting the order $$m$$, proposed using information criteria for the order selection of SMA in practice.

Finally, an attentive reader has already spotted that the formula for SMA corresponds to the procedure of CMA of an odd order from equation (3.4). They are similar, but they have a different purpose: CMA is needed to smooth out the series, the calculated values are inserted in the middle of the average, while SMA is used for forecasting, and the point forecasts are inserted at the end of the average.

### 3.3.4 Random Walk with drift

So far, we have discussed the methods used for level time series. But as mentioned in Section 3.1, there are other components in the time series. In the case of the series with a trend, Naïve, Global Mean and SMA will be inappropriate because they would be missing the essential component. The simplest model that can be used in this case is called “Random Walk with drift,” which is formulated as: $$$y_t = y_{t-1} + a_0 + \epsilon_t, \tag{3.11}$$$ where $$a_0$$ is a constant term, the introduction of which leads to increasing or decreasing trajectories, depending on the value of $$a_0$$. The point forecast from this model is calculated as: $$$\hat{y}_{t+h} = y_{t} + a_0 h, \tag{3.12}$$$ implying that the forecast from the model is a straight line with the slope parameter $$a_0$$. Figure 3.8 shows how the data generated from Random Walk with drift and $$a_0=10$$ looks like. This model is discussed in Section 8.

y <- sim.ssarima(orders=list(i=1), lags=1, obs=120,
constant=10)
testModel <- msarima(y, orders=list(i=1), constant=TRUE,
h=10, holdout=TRUE)
plot(testModel, 7, main="")

The data in Figure 3.8 demonstrates a positive trend (because $$a_0>0$$) and randomness from one observation to another. The model is helpful as a benchmark and a special case for several other models because it is simple and requires the estimation of only one parameter.

### 3.3.5 Seasonal Naïve

Finally, in the case of seasonal data, there is a simple forecasting method that can be considered as a good benchmark in many situations. Similar to Naïve, Seasonal Naïve relies only on one observation, but instead of taking the most recent value, it uses the value from the same period a season ago. For example, for producing a forecast for January 1984, we would use January 1983. Mathematically this is written as: $$$\hat{y}_t = y_{t-m} , \tag{3.13}$$$ where $$m$$ is the seasonal frequency. This method has an underlying model, Seasonal Random Walk: $$$\hat{y}_t = y_{t-m} + \epsilon_t. \tag{3.14}$$$ Similar to Naïve, the higher variability of the error term $$\epsilon_t$$ in (3.14) is, the faster the data exhibit changes. Seasonal Naïve does not require estimation of any parameters and thus is considered one of the popular benchmarks in seasonal data. Figure 3.9 demonstrates how the data generated from seasonal Random Walk looks and how the point forecast from the seasonal Naïve applied to this data performs.

y <- sim.ssarima(orders=list(i=1), lags=4,
obs=120, sd=50)
testModel <- msarima(y, orders=list(i=1), lags=4,
h=10, holdout=TRUE)
plot(testModel, 7, main="")

Similarly to the previous methods, if other approaches cannot outperform seasonal Naïve, it is not worth spending time on those approaches.

### References

• Svetunkov, I., Petropoulos, F., 2018. Old dog, new tricks: a modelling view of simple moving averages. International Journal of Production Research. 56, 6034–6047. https://doi.org/10.1080/00207543.2017.1380326