I’m currently doing a literature review for one of my papers on intermittent demand forecasting with machine learning, and I’ve noticed a recurring fundamental mistake in several recently published papers, even in respectable peer-reviewed journals.
The mistake? Using error measures based on the Mean Absolute Error (MAE). This is a crime against the humanity when working with intermittent demand. I’ve explained this issue multiple times before (here, here, and here), but it appears that this idea needs to be repeated over and over again. Let me explain.
MAE is minimised by the median. In the case of intermittent demand, the median can often be zero. If you use MAE (or scaled measures like MASE or sMAE) to evaluate forecasts and compare, for example, Croston, TSB, ETS, and an Artificial Neural Network (ANN), you may find the ANN outperforming the others. However, this could simply mean that the ANN produces forecasts closer to zero than the alternatives. This is not what you want for intermittent demand! The goal is to capture the structure correctly and produce conditional mean forecasts (typically). Instead, by relying on MAE, you might conclude: “We won’t sell anything in the next two weeks”, implying that there’s no need to stock products. This is apparently wrong and unhelpful.
Attached to this post is a figure showing three forecasts for an intermittent demand series:
- The blue line represents the mean of the data;
- The green line is a forecast from an Artificial Neural Network;
- The red line is the zero forecast.
In the figure’s legend, you’ll see error measures indicating that the zero forecast performs best in terms of MAE, followed by the ANN, and lastly, the mean forecast. Based on MAE, the conclusion would be: “We won’t sell anything, so don’t bother stocking the product”. But this outcome occurs solely because 12 out of 20 values in the holdout are zeros, making the median zero as well.
On the other hand, RMSE provides a more reasonable evaluation, showing that the mean of the data is more informative and preferable to the other methods.
The brief summary of this post is: *Don’t use MAE-based error measures for intermittent demand!* (Insert as many exclamation marks as you’d like!)
P.S. Actually, as a general rule, avoid using MAE for evaluating methods that produce mean forecasts. For more details, check out this post.
P.P.S What frustrates me a lot is that the reviewers of those papers did nothing to fix this issue, which means that they are clueless about that as well.