For the last year, Anna Sroginis and I have been working on a paper, trying to modernise demand classification schemes and make them useful in the brave new era of machine learning. We have finally wrapped it up and submitted it to a peer-reviewed journal. But the temptation to share was too strong, so we have also uploaded it to arXiv, and it is now available here.
What is this paper about?
Intermittent demand is a common challenge in sectors like supply chain and retail. But the key issue is that zeroes in sales can happen for two fundamentally different reasons (see one of my previous posts):
- Nobody wanted to buy the product (naturally occurring zeroes),
- Nobody could buy the product (artificially occurring due to stockouts, etc).
However, forecasting methods are typically unaware of this distinction and treat both types equally. This can lead to inaccurate forecasts and poor decisions. On top of that, existing classification schemes for intermittent demand (such as SBC) use arbitrary thresholds and rely on choosing between forecasting methods like Croston and SBA. There’s a clear need for smarter, more flexible tools that can distinguish between types of demand and make classifications practical.
In this paper, we introduce a two-stage, model-based framework called “Automatic Identification of Demand” (AID), designed to bring more clarity and accuracy to demand classification. The first stage uses a data-driven approach to detect artificially occurring zeroes. Once those are accounted for, the second stage classifies the demand into one of six categories based on key characteristics: whether the demand is regular or intermittent, whether it consists of count or fractional values, and whether intermittent demand is smooth or lumpy in nature. AID detects stockouts by analysing demand intervals using the Geometric distribution, then flags the demand as one of those six types based on several simple statistical models.
We applied AID to a retailer dataset covering over 31,000 products with weekly sales across three stores. Based on that, we generated several features and tested multiple approaches (local level, pooled regression, and LightGBM) to see whether their accuracy improved. We found that:
- Correcting for stockouts significantly improved the accuracy of all approaches;
- Using a mixture approach (separating demand into sizes and occurrences) yielded large gains in accuracy, regardless of the forecasting method used;
- Further splitting the data by demand categories (e.g., regular vs. intermittent, smooth vs. lumpy) provided additional, though more modest, benefits.
We argue that these three principles are universally valuable for forecasting, no matter what approach you use. If you face intermittent demand, at a minimum, consider detecting stockouts and then using the mixture approach.
Hope you find this paper useful. Let me know what you think in the comments.