I have summarised several posts on point forecasts evaluation in an article for the Foresight journal. Mike Gilliland, being the Editor-in-Chief of the journal, contributed to the paper a lot, making it read much smoother, but preferred not to be included as the co-author. This article was recently published in the issue 74 for Q3:2024. I attach the author copy to this post just because I can. Here is the direct link.
Here are the Key Points from the article:
- Evaluation is important for tracking forecast process performance and understanding whether changes (to forecasts, models, or the overall process) are needed.
- Understand what kind of forecast our models produce, and measure it properly. Most likely, our approach produces the mean (rather than the median) as a point forecast, so root mean squared error (RMSE) should be used to evaluate it.
- To aggregate the error measure across several products, you need to scale it. A reliable way of scaling is to divide the selected error measure by the mean absolute differences of the training data. This way we get rid of the scale and units of the original measure and make sure that its value does not change substantially if we have trend in the data.
- Avoid MAPE!
- To make decisions based on your error measure, consider using the FVA framework, directly comparing performance of your forecasting approach with the performance of some simple
benchmark method.
Disclaimer: This article originally appeared in Foresight, Issue 74 (forecasters.org/foresight) and is made available with permission of Foresight and the International Institute of Forecasters.