When I read posts written by some ML experts, I sometimes notice that they either overlook or do not clearly explain a few crucial steps in demand forecasting. In this post, I want to highlight the three most important ones based on my personal experience.
First and foremost, stationarity (see the proper definition of the term here). Time series often exhibit either a changing level or clear trends. While some models (such as ETS) take care of that directly via specific components, others need some preliminary steps before being applied. The simplest thing one can do is to take differences of the original data if it shows any form of non-stationarity and model the rates of change instead of just demand. For ML, this is extremely important because typical approaches (such as decision trees, k-NN, neural networks) cannot extrapolate. So, getting rid of the trend and/or ensuring that the level does not change over time will help ML approaches do the job they are supposed to do. And don’t use the global trend (see image in the post), as time series rarely exhibit a constant increase or decrease. In real life, the trend is usually stochastic, implying that the average sales change at a varying rate, not a fixed one.
Second, real time series often exhibit heteroscedasticity (i.e. the variance of the data increases with the level). The simplest way to stabilise the variance is to take logarithms of the data. This is not a universal solution, but it works in many cases. This way, the error in the model should have a constant variance. A more advanced approach is to use the Box-Cox transformation, but this requires estimating the parameter lambda, which is not always straightforward. The main issue with this arises when working with intermittent demand, where some unpredictable zeroes occur (the logarithm of zero equals -infinity). In that case, you might want to take logarithms of demand sizes (non-zero values) instead of the demand itself and switch to a mixture model. Another simple but inelegant trick is to add one to every observation and then take logarithms. This works but breaks my heart.
Third, seasonality is extremely important in time series. There are many ways one can capture it. The simplest is by introducing dummy variables, but this might cause issues because, in reality, seasonality often changes over time (see this recent post). So, a better way of capturing it is either by extracting the seasonal component from STL or ETS and using it as a feature or by using the lagged values of your data. Depending on the specific situation and approach, there can be many other ways of capturing seasonality, and frankly, I struggle to come up with a universal one.
Bonus: if you work with intermittent demand, splitting it into demand sizes and demand occurrence might boost the accuracy of your approach if the underlying levels are captured correctly. Anna Sroginis and I show this improvement for LightGBM, regression, and simple local level approaches in our paper (currently under revision).
P.S. Kandrika and I will deliver a course on “Demand Forecasting Principles with examples in R” in November, where we will discuss these and other important aspects of demand forecasting. You can still sign up for it here