This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

3.1 Measuring accuracy of point forecasts

We start with a situation, when point forecasts are of the main interest. In this case we typically start by splitting the available data into train and test sets, and apply the models under consideration to the first one, producing the forecasts for the second, not showing that part to the model. This is called "fixed origin" approach: we fix the point in time, from which to produce forecasts, we produce them, calculate some sort of error measures and compare the models.

There are different error measures that can be used in this case, the selection of one depends on the specific needs. Here we briefly discuss them, noting that the topic has already been extensively discussed in different sources (Davydenko and Fildes 2013; Svetunkov 2019; Svetunkov 2017). Here we discuss only the main aspects of the error measures.

The very basic error measures are Root Mean Squared Error (RMSE): \[\begin{equation} \text{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h \left( y_{t+j} - \hat{y}_{t+j} \right)^2 }, \tag{3.1} \end{equation}\] and Mean Absolute Error (MAE): \[\begin{equation} \text{MAE} = \frac{1}{h} \sum_{j=1}^h \left| y_{t+j} - \hat{y}_{t+j} \right| , \tag{3.2} \end{equation}\]

where \(y_{t+j}\) is the actual value \(j\) steps ahead from the holdout, \(\hat{y}_{t+j}\) is the \(j\) steps ahead point forecast (conditional expectation of the model) and \(h\) is the forecast horizon. As you see, these error measures aggregate the performance of competing forecasting methods across the forecasting horizon, averaging out the specific performances on each \(j\). If this information needs to be retained, then the summation can be dropped to obtain just "SE" and "AE".

It is well-known (see, for example, Kolassa 2016) that RMSE is minimised by the mean value of a distribution, and MAE is minimised by the median. So, when selecting between the two, you should consider this property.

The main advantage of these error measures is that they are very simple and have a clear interprertation: they show the average distance from the point forecasts to the actual values. They are perfect if you work with only one time series. However, they are not suitable, when you have several time series and want to see the performance of methods across them. This is mainly because they are scale dependent and contain specific units: if you measures sales of bananas in pounds, then MAE and RMSE will show the error in pounds. And, as we know, you should not add up pounds of bananas with pounds of apples - the result might not make sense.

In order to tackle this issue, different error scaling techniques have been proposed over the years, resulting in a zoo of error measures:

  1. MAPE - Mean Absolute Percentage Error;
  2. MASE - Mean Absolute Scaled Error (Hyndman and Koehler 2006);
  3. rMAE - Relative Mean Absolute Error (Davydenko and Fildes 2013);
  4. sMAE - scaled Mean Absolute Error (Petropoulos and Kourentzes 2015);
  5. and others.

They have their own advantages and disadvantages, and we will not discuss them here. It should suffice to say that the selection of error measure should be dictated by the needs of the forecaster. If you want a robust measure that works consistently, but do not care about the interpretation, then go with MASE. If you want an interpretation, then either go with rMAE, or sMAE. And you typically should avoid MAPE and other Percentage Error measures, because they are highly influenced by the actual values you have in the holdout. Furthermore, similarly to the measures above, one can propose RMSE-based scaled and relative error measures.

Finally, when aggregating performance of forecasting methods across several time series, sometimes it makes sense to look at the distribution of errors - this way you will know, which of the methods fails seriously, and which does a consistently good job.

References

Davydenko, Andrey, and Robert Fildes. 2013. “Measuring Forecasting Accuracy: The Case of Judgmental Adjustments to SKU-Level Demand Forecasts.” International Journal of Forecasting 29 (3): 510–22. doi:10.1016/j.ijforecast.2012.09.002.

Hyndman, Rob J, and Anne B Koehler. 2006. “Another look at measures of forecast accuracy.” International Journal of Forecasting 22 (4): 679–88. doi:10.1016/j.ijforecast.2006.03.001.

Kolassa, Stephan. 2016. “Evaluating predictive count data distributions in retail sales forecasting.” International Journal of Forecasting 32 (3): 788–803. doi:10.1016/j.ijforecast.2015.12.004.

Petropoulos, Fotios, and Nikolaos Kourentzes. 2015. “Forecast combinations for intermittent demand.” Journal of the Operational Research Society 66 (6): 914–24. doi:10.1057/jors.2014.62.

Svetunkov, Ivan. 2017. “Naughty Apes and the Quest for the Holy Grail.” Modern Forecasting. https://forecasting.svetunkov.ru/en/2017/07/29/naughty-apes-and-the-quest-for-the-holy-grail/.

Svetunkov, Ivan. 2019. “Are You Sure You’re Precise? Measuring Accuracy of Point Forecasts.” Modern Forecasting. https://forecasting.svetunkov.ru/en/2019/08/25/are-you-sure-youre-precise-measuring-accuracy-of-point-forecasts/.