Best practice for forecasts evaluation for business

One question I received from my LinkedIn followers was how to evaluate forecast accuracy in practice. MAPE is wrong, but it is easy to use. In practice, we want something simple, informative and straightforward, but not all error measures are easy to calculate and interpret. What should we do? Here is my subjective view.

Step 1. Choose error measure.

If you are interested in measuring the performance of your approaches at the individual level (e.g., SKU), you can use RMSE without needing to scale it. Why RMSE? As discussed in a previous post, it is minimized by the mean, which is what most forecasting methods produce. If your method produces medians, then you should use MAE instead of RMSE.

If you want to measure the performance of different approaches across several time series, you can still calculate individual RMSEs. However, before aggregating, you need to scale them to avoid adding apples to oranges and beer bottles. Plus, the volume of sales might differ substantially from one product to another. The simplest scaling method is to divide RMSE by the in-sample mean. This has issues if data exhibits trends: if sales are increasing, your in-sample mean will also increase, deflating the RMSE. A better approach is to divide RMSE by the root mean squared differences of the data, which are typically more stable. This measure, called RMSSE (Root Mean Squared Scaled Error), was used in the M5 competition and was motivated by Athanasopoulos & Kourentzes (2022). The measure itself is hard to interpret, but we will address this in the next steps.

Step 2. Benchmarks

Calculate the error measures for some benchmark approaches, such as Naive, ETS, and ARIMA. These will be used as baselines for the next step.

Step 3. “Forecast Value Added” (FVA)

Calculate something called FVA (by Mike Gilliland). This approach calculates the ratio between the error measure for the method of interest and the benchmark. You end up with a value showing how many percent your method is better than the benchmark. For example, if the FVA for your ML approach compared to ETS was 0.85, you can say that it improves accuracy by 15% (1-0.85=0.15).

And that’s it!

So, instead of focusing on values of some APE, we would be moving to the discussion of how you can improve your process directly. Doing FVA, gives you a meaningful information about the performance of your approaches and can help you make specific changes to the forecasting process if needed.

There are also many other questions related to forecasting performance evaluation, such as what decisions you plan to make, on what level you should produce forecasts, how you plan to use forecasts etc. I might return to some of them in future posts.

Open Forecasting

Leave a Reply Cancel reply