Archives extrapolation methods - Open Forecasting

smooth in python: Non-normal distributions in ETS/ARIMA

Ivan Svetunkov — Wed, 27 May 2026 14:17:46 +0000

So, you know quite well that the normal distribution is one of the most popular distributions in statistics. The reasons are manifold, including convenience for the academic community and the fact that it is taught in every single statistics course in the world. But what if we don’t want to be normal?

There are situations where non-normal distributions fit considerably better. The main candidate for substitution is the conditional distribution of the response variable. For example, sales of engines cannot follow the normal distribution by definition: they are intermittent and integer-based — you cannot sell 1.78 engines. More generally, while demand can be fractional, it cannot be negative. It is therefore only logical to use distributions that support positive values only in these situation. Examples include Log-Normal, Gamma, and Inverse Gaussian, among many others.

In my last paper with John Boylan (this one), we discussed how ETS can be extended to use these three distributions instead of the normal one. I implemented this functionality (together with the others, such as Laplace and Generalised Normal) in ADAM. It supports any of these distributions with any ETS/ARIMA/regression model, for both additive and multiplicative error terms. There is some maths involved, which you can find here and here.

Why bother? The main is in the predictive distribution. If the data is not normal, we may end up with poorly calibrated forecasts and misleading prediction intervals. Using a more appropriate distribution can resolve this.

But how do we choose the right distribution for our data?

A possible solution (similar to selecting ETS components) is to fit models with different distributions and pick the one with the lowest information criterion. This is implemented in the ADAM function from the smooth package. We can do this manually, or use AutoADAM (called auto.adam in R) to select the most suitable distribution based on AICc automatically:

from fcompdata import AirPassengers
from smooth import AutoADAM

model = AutoADAM(lags=[12], h=12, holdout=True, orders=None, verbose=True)
model.fit(AirPassengers.y)
model.summary()

The orders=None line stops the function from trying different ARIMA orders – something we will come back to in a future post. For this example, the output is:

Model estimated using ADAM() function: ETS(MAM)
Response variable: y
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 523.2756
Coefficients:
       Estimate  Std. Error  Lower 2.5%  Upper 97.5%   
alpha    0.7575      0.0895      0.5807       0.9343  *
beta     0.0000      0.0080      0.0000       0.0158   
gamma    0.0000      0.0503      0.0000       0.0994   
Error standard deviation: 0.0358
Sample size: 144
Number of estimated parameters: 4
Number of degrees of freedom: 140
Information criteria:
      AIC     AICc       BIC      BICc
1054.5512 1054.839 1066.4305 1067.1455

Boring… the function found that the Normal distribution has the lowest AICc among those tested – the Air Passengers data is too well-behaved.

Oh, and don’t forget to produce the forecasts:

model.predict(h=18, interval="prediction")

Smooth forecasting!

Install smooth: pip install smooth

Message smooth in python: Non-normal distributions in ETS/ARIMA first appeared on Open Forecasting.

smooth in python: multiple seasonal ETS

Ivan Svetunkov — Mon, 11 May 2026 08:08:38 +0000

Another interesting case in demand forecasting is the high frequency data. For example, if you work with demand on daily level, you might notice that demand increases every Monday but also exhibits proper seasonal fluctuations (e.g. decline every Winter). What do you do in this case?

One of the solutions (old but gold) is the multiple seasonal ETS model, which was originally developed by James Taylor (2003) for the pure additive exponential smoothing. The idea was quite simple: to model multiple seasonal cycles, one can add multiple seasonal components, i.e. to capture the day-of-week (frequency 7) and the day-of-year (frequency 365) effects. While it worked fine for some examples, the main issue with it has been its computational speed (or rather slowness): the original ETS needs to estimate all smoothing parameters + all the initial values for seasonal indices and other components. Both ADAM and ES in the smooth package support multiple seasonalities and avoid the whole issue by using a different model initialisation called “backcasting”.

Here is a classical example from James’ paper on the half-hourly electricity demand (see the image in the post). It is clear that there is a half-hour-of-day and the day-of-week effects. In ES, this means that we need to provide the vector for the lags variable:

from smooth import ES
from fcompdata import taylor

# Fit ES with automatic ETS model selection
model = ES(lags=[48, 336], h=336, holdout=True)
model.fit(taylor.y)
model.predict(h=336)
print(model)

This is the output I get from the function:

Time elapsed: 2.03 seconds
Model estimated using ES() function: ETS(MNM)
With backcasting initialisation
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 25391.1773
Persistence vector g:
 alpha gamma1 gamma2
0.2899 0.1283 0.5270
Sample size: 3696
Number of estimated parameters: 4
Number of degrees of freedom: 3692
Information criteria:
      AIC      AICc       BIC      BICc
50790.3546 50790.3654 50815.2146 50815.2591

Forecast errors:
ME: 829.1195; MAE: 942.1447; RMSE: 1065.1127
sCE: 941.5012%; Asymmetry: 9.2%; sMAE: 3.1841%; sMSE: 0.1296%
MASE: 1.4491; RMSSE: 1.1286; rMAE: 0.1408; rRMSE: 0.1300

The computational time on this data was only 2.03 second. In this time, the function tried several possible ETS models and selected the best one based on the AICc value. The resulting best model is ETS(M,N,M), which makes perfect sense for this data.

Is there a way to improve this model? Yes! Taylor mentions that adding AR(1) to the cocktail tends to improve the accuracy in case of multiple seasonal series. We can try that if we switch to ADAM:

from smooth import ADAM

# Fit ADAM ETS(MNM)+AR(1) model
model = ADAM(model="MNM", ar_orders=1, lags=[48, 336], h=336, holdout=True)
model.fit(taylor.y)
print(model)
model.plot(7)

Here is the output:

Time elapsed: 1.04 seconds
Model estimated using ADAM() function: ETS(MNM)+ARIMA(1,0,0)
With backcasting initialisation
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 24157.2473
Persistence vector g:
 alpha gamma1 gamma2
0.1097 0.2225 0.3481
ARMA parameters of the model:
             Lag 1
AR(1)       0.6852
Sample size: 3696
Number of estimated parameters: 5
Number of degrees of freedom: 3691
Information criteria:
      AIC      AICc       BIC      BICc
48324.4947 48324.5109 48355.5697 48355.6365

Forecast errors:
ME: 276.4061; MAE: 462.5092; RMSE: 588.5957
sCE: 313.8711%; Asymmetry: 2.1%; sMAE: 1.5631%; sMSE: 0.0396%
MASE: 0.7114; RMSSE: 0.6237; rMAE: 0.0691; rRMSE: 0.0719

The resulting model has lower AICc, but also produces more accurate point forecasts (compare RMSSE values) for the holdout set. The following image shows the data and the point forecasts for it:

Double seasonal ETS(M,N,M) applied to the half-hourly electricity demand data

What else can we do here? Actually, quite a lot: multistep losses, seasonal ARIMA, explanatory variables – things can get only more complicated from here. Have a look at this.

Do I hear someone shouting “TBATS”? TBATS is the exponential smoothing with additional bells and whistles (ETS + adapted Fourier terms + ARMA errors). I don’t have it as a separate function in the smooth just yet, but you can reproduce it, for example, like this.

So, what are you waiting for? Dive in and see how it works for yourself!

Install smooth: pip install smooth

Message smooth in python: multiple seasonal ETS first appeared on Open Forecasting.

smooth in python: ETS with explanatory variables

Ivan Svetunkov — Tue, 05 May 2026 08:03:37 +0000

We continue our series of posts on the functions from the smooth package for Python/R. Today we will see how to enhance your exponential smoothing with explanatory variables. What? Yes, you heard me! Let’s dive in!

We all know that in real life sales don’t just evolve over time on their own. Any univariate model, such as ARIMA or ETS is just a way to approximate a complex reality. In practice, there are many factors affecting the demand for your product. What would happen if the price on your product increases? What if you run a promotion (e.g. “Buy One, Get One Free”)? Your competitor’s strategy impacts the demand for your product as well… There’s lots of different factors, and some of them can be quite useful in demand forecasting. But can we join the dynamic univariate models with regression?

Yes, we can! Although ETS is thought as a pure univariate model, it is easy to extend to include explanatory variables. There are several great papers showing how it works (e.g. Kourentzes & Petropoulos, 2016), and in fact the es() function from the smooth package for R was used as a benchmark in the M5 competition.

So, consider a situation where you have weekly sales of a product with some recorded promotions (encoded as dummy variables). We will use a time series from the fcompdata package for Python. The first image shows how the series looks, the vertical lines show when promotions happen. The series itself seems to be seasonal, roughly repeating peaks and troughs every 52 observations (every year). Also, we see that there are two types of promotions, and when they happen sales tend to increase. So, including them should improve the model fit, and if the company decides to run promotions again, the model will forecast demand better. I will start by fitting the ETS(M,N,M) to the data:

from smooth import ES
from fcompdata import PromoData

y = PromoData.y

model = ES(model="MNM", lags=52, holdout=True, h=13)
model.fit(y)
model.predict(h=13)
model.plot(7)

NOTE: PromoData has a specific structure with several attributes. PromoData.x contains the in-sample data, PromoData.xx has the holdout – this is consistent with the Mcomp package for R. The new features in python are:

PromoData.y – concatenated training and test sets,
PromoData.xregx – matrix of explanatory variables for the training set,
PromoData.xregxx – matrix of explanatory variables for the test set,
PromoData.xreg – the full (concatenated) matrix of explanatory variables.

The following image shows the model fit and the point forecasts from the ETS(M,N,M):

ETS(M,N,M) fit and forecast for the promotional data example

As expected, because the model does not take promotions into account, it fits the data as best as it can and produces forecasts that are oblivious of the potential external effects on sales. We can improve it by including the promotional dummies:

X_train = PromoData.xreg
X_test =  PromoData.xregxx

model = ES(model="MNM", lags=52, holdout=True, h=13)
model.fit(y, X_train)
model.predict(h=13, X=X_test)
model.plot(7)

ETS(M,N,M) with explanatory variables

The image above shows the fit and the point forecasts from the ETSX(M,N,M) model that now takes the promotions into account. This is quite an improvement in comparison with the previous one. Furthermore, if we can control when to have promotions and what types of promotions to run, we can change the values in the `X_test` matrix and see what demand to expected in that situation. So, this gives an analyst a tool for a more advanced sensitivity analysis.

Read more about the ETSX here.
Install smooth: pip install smooth
ETSX wiki on github.

Message smooth in python: ETS with explanatory variables first appeared on Open Forecasting.

smooth in python: ETS forecast combination

Ivan Svetunkov — Mon, 27 Apr 2026 08:01:30 +0000

Last time we saw how to do automated model selection using the ES function from the smooth package. Now I want to show how to produce combined forecasts from ETS.

Why bother?

There is a vast body of literature on forecast combinations (read this great review). The main idea is that you should not put all your eggs in one basket — the safer strategy is to combine forecasts from different models instead of selecting just one. Yes, it is more computationally expensive, but the trade-off is higher accuracy on average.

For ETS, a great solution was proposed by Stephan Kolassa in his 2011 paper: extract AIC values, calculate AIC weights (giving the highest weight to the best-performing model and lower ones to the rest), then combine the forecasts. The resulting forecasts tend to be more robust, because in practice it might be hard to tell the difference between, for example, ETS(M,A,M) and ETS(M,Md,M). So why choose one when you can have all? I implemented this mechanism in the smooth package for R years ago, and now it is also available in Python.

Here is how it works on an example using an M3 time series. I picked this specific one because it is seasonal, but the trend is not very well pronounced. The series is shown in the first image.

from smooth import ES
from fcompdata import M3

series = M3[1687]
y = series.y
freq = series.period

# Fit ETS models, combine forecasts
model = ES(model="CXC", lags=series.period, h=18, holdout=True)
model.fit(y)
model.predict(h=18)

The code above tells ES to fit all ETS models with additive and no trend (“X” in the middle), calculate AIC weights, produce forecasts from each one of them, and then combine them. The resulting point forecast is the weighted combination of the individual forecasts. If a prediction interval is required, the specific quantiles are combined directly (see the paper by Lichtendahl et al., 2013). This is inevitably slower than the default model selection mechanism, but is a safer approach. The point forecast and the prediction interval (grey lines) are shown in the attached image.

Note that the user can regulate the pool of combined models via the “model” parameter of the function. This wiki explains all the accepted options.

So why not go ahead and try it yourself, and see how it works for your data?

🔗 Install smooth: pip install smooth
📖 More on forecasts combination in ADAM.

Message smooth in python: ETS forecast combination first appeared on Open Forecasting.

smooth in python: ETS with model selection

Ivan Svetunkov — Wed, 22 Apr 2026 00:06:43 +0000

As some of you have heard, the smooth package is now on PyPI. So, I’ve decided to write a series of posts showcasing how some of its functions work. We start with the basics, ETS.

ETS stands for the “Error-Trend-Seasonal” model or ExponenTial Smoothing. It is a statistical model that relies on time series decomposition and updates the unobserved states (level/trend/seasonal) based on the mistakes it makes. In a way, you can call it an adaptive model that changes its forecast based on the most recent available information. It is relatively simple to explain and work with, and it has performed well in a variety of competitions (M3, M4, M5, for example).

The smooth package implements an advanced form of ETS in the ADAM and a more basic one in the ES classes. In fact, ES is just a wrapper of ADAM, it is the conventional model, with just some tuning. Both support all 30 ETS models, have automated model selection and forecast combination, allow producing point forecasts and a variety of prediction intervals types. In fact, if you want a straightforward robust implementation of ETS, give ES a try.

Here’s how to use it in Python:

from smooth import ES
from fcompdata import M3

# Pick a series from the M3 competition for demonstration
series = M3[2568]
y = series.x
freq = series.period

# Fit ES with the automatic model selection
model = ES(lags=freq, h=18, holdout=True)
model.fit(y)
print(model)

Running this produces output similar to this:

Time elapsed: 0.4 seconds
Model estimated using ES() function: ETS(MAM)
With backcasting initialisation
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 724.8524
Persistence vector g:
 alpha   beta  gamma
0.0065 0.0000 0.0000
Sample size: 98
Number of estimated parameters: 4
Number of degrees of freedom: 94
Information criteria:
      AIC      AICc       BIC      BICc
1457.7047 1458.1348 1468.0446 1469.0306

Forecast errors:
ME: -580.9985; MAE: 604.0204; RMSE: 710.5457
sCE: -149.9347%; Asymmetry: -2.5%; sMAE: 8.6598%; sMSE: 1.0378%
MASE: 0.2653; RMSSE: 0.2452; rMAE: 0.2555; rRMSE: 0.2163

A few things worth noting from the output:

ES automatically selected ETS(MAM) based on the AICc value – a multiplicative error, additive trend, multiplicative seasonality model – as the best fit
It used backcasting for the model initialisation (default), which speeds up the process and requires fewer parameters to estimate
It kept the last 18 observation for the holdout, produced autoforecasts for it and calculated several forecast errors. This is handy if you want to directly compare different smooth models on a time series.

But why are we here? We want to forecast! So, here it is:

model.predict(h=18, interval="prediction")
model.plot(7)

This should produce an image similar to the one attached to the post. As simple as that.

Now it’s your turn! :)

🔗 Install smooth: pip install smooth
📖 smooth wiki

Message smooth in python: ETS with model selection first appeared on Open Forecasting.

There’s no such thing as “deterministic forecast”

Ivan Svetunkov — Mon, 02 Mar 2026 22:45:31 +0000

Sometimes I see people referring to a “deterministic” forecast, and I have some personal issues with this. Because if you apply a model to data then there is nothing deterministic about your forecasts!

In many contexts, “deterministic” has a precise meaning: no randomness, no uncertainty. A deterministic solution to an optimisation problem (e.g. linear programming) implies that there are no random inputs or outputs once the model and its parameters are fixed. Forecasting is different. As Chatfield and many others have pointed out, forecasting has multiple sources of uncertainty, and there is essentially zero chance that the future will unfold exactly as any single number suggests.

Yes, some people use “deterministic” as a synonym for “point forecast”. But that label is still misleading, because a point forecast is not uncertainty-free – it is just one summary of a predictive distribution (often the conditional mean, sometimes the median or another functional).

Here’s a quick reality check you can do yourself. Take a dataset, apply your model, and write down the point forecast for the next few observations. Now add one new observation, re-estimate, and forecast again (the image in this post depicts exactly that, but with 50 forecasts produced on different subsamples of data). The point forecast will change unless you are dealing with an exotic situation with non-random data (e.g. every day, you sell exactly 100 units). So, which of the two was the “deterministic” forecast? If forecasts were truly deterministic in the strict sense, you would not get multiple plausible values from small, reasonable changes in the sample.

This happens because any forecasting method (statistical or ML) depends on data and on modelling choices: parameter estimation, feature selection, splitting rules, tuning, even decisions like “use α=0.1”. Those choices can be fixed across samples of data, but fixing them does not remove uncertainty – it only hides it. The randomness is still there in the data and in the fact that we only observe a sample of it.

So when you see someone mentioning “deterministic forecast”, it’s worth translating it mentally to: “a point forecast, probably a conditional mean”. If you care about decisions and risk, you should know that there is an uncertainty associated with this so called “deterministic forecast”, and that it should not be ignored. But this is a topic for another discussion in another post.

Message There’s no such thing as “deterministic forecast” first appeared on Open Forecasting.

Risky business: how to select your model based on risk preferences

Ivan Svetunkov — Mon, 19 Jan 2026 11:28:04 +0000

What do you use for model selection? Do you select the best model based on its cross-validated performance, or do you use in-sample measures like AIC? If so, there is a way to improve your selection process further.

JORS recently published the paper of Nikos Kourentzes and I based on a simple but powerful idea: instead of using summary statistics (like the mean RMSE of cross-validated errors), you should consider the entire distribution and choose a specific quantile. This aligns with my previous post on error measures, but here is the core intuition:

The distribution of error measures is almost always asymmetric. If you only look at the average, you end up with a “mean temperature in the hospital” statistic, which doesn’t reflect how models actually behave. Some models perform great on most series but fail miserably on a few.

What can we do in this case? We can look at quantiles of distribution.

For example, if we use 84th quantile, we compare the models based on their “bad” performance, situations where they fail and produce less accurate forecasts. If you choose the best performing model there, you will end up with something that does not fail as much. So your preferences for the model become risk-averse in this situation.

If you focus on the lower quantile (e.g. 16th), you are looking at models that do well on the well-behaved series and ignore how they do on the difficult ones. So, your model selection preferences can be described as risk-tolerant, because you are accept that the best performing model might fail on a difficult time series.

Furthermore, the median (50th quantile, the middle of sample), corresponds to the risk-neutral situation, because it ignores the tails of the distribution.

What about the mean? This is a risk-agnostic strategy, because it says nothing about the performance on the difficult or easy time series – it takes everything and nothing in it at the same time, hiding the true risk profile.

So what?

In the paper, we show that using a risk-averse strategy tends to improve overall forecasting accuracy in day-to-day situations. Conversely, a risk-tolerant strategy can be beneficial when disruptions are anticipated, as standard models are likely to fail anyway.

So, next time you select a model, think about the measure you are using. If it’s just the mean RMSE, keep in mind that you might be ignoring the inherent risks of that selection.

P.S. While the discussion above applies to the distribution of error measures, our paper specifically focused on point AIC (in-sample performance). But it is a distance measure as well, so the logic explained above holds.

P.P.S. Nikos wrote a post about this paper here.

P.P.P.S. And here is the link to the paper.

Message Risky business: how to select your model based on risk preferences first appeared on Open Forecasting.

Evolving seasonality

Ivan Svetunkov — Mon, 22 Sep 2025 19:21:09 +0000

Here is another fascinating aspect of the seasonal profile in your data: it can evolve over time due to changing consumer preferences. How so? Let me explain.

I’ve worked with a couple of companies where there were some examples of data with drastically changing seasonal patterns over just a few years. For example, Before Covid (BC), the consumer behaviour could be different than After the Disruption (AD): people started ordering online some products that they used to buy in shops. Some practitioners told me that the seasonal patterns in their data had changed so dramatically that historical data BC had become practically useless.

Why is this important?

If you don’t recognise that the seasonal patterns can change and simply use seasonal dummy variables (for example, in regression or decision trees), you’ll run into problems, as these approaches won’t capture the evolving profile. The same applies to classical decomposition (see Section 3.2 of ADAM), since it assumes a fixed seasonal structure. In fact, any model that assumes fixed seasonality would fail in this situation, and you may not even notice (see image in the post, where the model fails to capture the profile correctly because it assumes that it is fixed).

What can we do with that?

The solution is to use approaches that allow seasonality to evolve over time. ARIMA and ETS handle this via their parameters, while STL decomposition produces a dynamic seasonal profile. In regression or decision trees, you could incorporate lagged sales to partially account for this, or bring in the seasonal component from STL/ETS as an additional feature.

All good? Not quite, because there is a nasty small grey elephant hidden in this room.

Even if your chosen method allows seasonality to evolve, you must ensure that forecasting uncertainty reflects this properly. If the seasonal pattern in your training set changed drastically, what prevents it from shifting again in the test set? This is particularly critical if you need predictive distributions, such as for setting safety stock levels or generating prediction intervals. Ignoring the fact that the seasonality might change further could make your predictive distribution narrower than expected, leading to potential lost sales.

The good news: ARIMA and ETS handle this naturally, as components’ uncertainty translates directly to the holdout variance. In ML, it is more complicated, because you would need to invest time in proper feature engineering to explicitly capture the potential seasonality changes. Unfortunately, I haven’t done much in the latter direction, so I cannot give you a good recipe. Any thoughts what to do here?

And what do you do in the situation like that?

Message Evolving seasonality first appeared on Open Forecasting.

Review of a paper on comparison of modern machine learning techniques in retail

Ivan Svetunkov — Sun, 22 Jun 2025 21:59:19 +0000

A couple of days ago, I noticed a link to the following paper in a post by Jack Rodenberg: https://arxiv.org/abs/2506.05941v1. The topic seemed interesting and relevant to my work, so I read it, only to find that the paper contains several serious flaws that compromise its findings. Let me explain.

Introduction

But first, why am I writing this post?

There’s growing interest in forecasting among data scientists, data engineers, ML experts etc. Many of them assume that they can apply their existing knowledge directly to this new area without reading domain-specific literature. As a result, we get a lot of “hit-or-miss” work: sometimes having promising ideas, but executed in ways that violate basic forecasting principles. The main problem with that is that if your experiment is not done correctly, your results might be compromised, i.e. your claims might be simply wrong.

If you’re a researcher writing forecasting-related papers, then hopefully reading this post (and the posts and papers I refer to), will help you improve your papers. This might lead to a smoother peer-review process. Also, while I can’t speak for other reviewers, if I come across a paper with similar issues, I typically give it a hard time.

I should also say that I am not a reviewer of this paper (I would not publish a review), but I merely decided to demonstrate what issues I can see when I read papers like that. The authors are just unlucky that I picked their paper…

Let’s start.

The authors apply several ML methods to retail data, compare their forecasting accuracy, and conclude that XGBoost and LightGBM outperform N-BEATS, NHITS, and Temporal Fusion Transformer. While the finding isn’t groundbreaking, additional evidence on a new dataset is always welcome.

Major issues

So, what’s wrong? Here is a list of the major comments:

Forecast horizon vs. data frequency:

Daily data with a 365-day forecast horizon makes no practical sense (page 2, paragraph 3). I haven’t seen any company making daily-level decisions a year in advance. Stock decisions are typically made on much shorter horizons, and if you need a year ahead forecast, you definitely do not need it on the daily level. After all, there is no point in knowing that on 22nd December 2025 you will have the expected demand of 35.457 units – it is too far into the future to make any difference. Some references:

Athanasopoulos and Kourentzes (2023) paper discusses data frequency and some decisions related to them;
and there is a post on my website on a related topic

Misuse of SBC classification:

Claiming that 70% of products are “intermittent” (page 2, last paragraph) based on SBC is incorrect. Furthermore, SBC classification does not make sense in this setting, and is not used in the paper anyway, so the authors should just drop it.

Read more about it here.
And there is a post of Stephan Kolassa on exactly this point

Product elimination and introduction is unclear (page 3):

The authors say “Around 30% of products were eliminated during training and 10% are newly introduced in validation”. It’s not clear why this was done and how specifically. This needs to be explained in more detail.

“Missing values” undefined:

It is not clear what the authors mean by “missing values” (page 3, “Handling Missing Values”). How do they appear and why? Are they the same as stockouts, or were there some other issues in the data? This needs to be explained in more detail.

Figure 1 is vague:

Figure 1 is supposed to explain how the missing values were treated. But the whole imputation process is questionable, because it is not clear how well it worked in comparison with alternatives and how reasonable it is to have an imputed series that look more erratic than the original one. The discussion of that needs to be expanded with some insights from the business problem.

No stockout handling discussion:

The authors do not discuss whether the data has stockouts or not. This becomes especially important in retail, because if the stockouts are not treated correctly, you would end up forecasting sales instead of demand

For example, see this post.

Feature engineering is opaque:

“Lag and rolling-window statistics for sales and promotional indicators were created” (page 3, “Feature Engineering”) – it is not clear, what specific lags, what length of rolling windows, and what statistics (anything besides mean?) were created. These need to be explained for transparency and so that a reader could better understand what specifically was done. Without this explanation, it is not clear whether the features are sensible at all.

Training/validation setup not explained:

It is not clear how specifically the split into training and validation sets was done (page 3, last paragraph), and whether the authors used rolling origin (aka time series cross-validation). If they did random splits, that could cause some issues, because the first law of time series is not to break its structure!

Variables transformation is unclear:

It is not clear whether any transformations of the response variable were done. For example, if the data is not stationary, taking differences might be necessary to capture the trend and to do extrapolation correctly. Normalisation of variables is also important for neural networks, unless this is built-in in the functions the authors used. This is not discussed in the paper.

Forecast strategy not explained:

It is not clear whether the direct or recursive strategy was used for forecasting. If lags were not used in the model, that would not matter, but they are, so this becomes a potential issue. Also, if the authors used the lag of the actual value on observation 235 steps ahead to produce forecast for 236 steps ahead, then this is another fundamental issue, because that implies that the forecast horizon is just 1 step ahead, and not 365, as the authors claim. This needs to be explained in more detail.

I’ve written a post about the strategies.

No statistical benchmarks:

At the very least, the authors should use simple moving average and probably exponential smoothing. Even if they do not perform well, this gives an additional information about the performance of the other approaches. Without them, the claims about good performance of the used ML approaches are not supported by evidence. The authors claim that they used mean as a benchmark, but its performance is not discussed in the paper.

Issues with forecast evaluation:

The whole Table 3 with error measures is an example of what not to do. Here are some of major issues:

There is no point in reporting several error measures – each one of them is minimised by their own statistics. The error measure should align with what approaches produce.
MSE, RMSE, MAE and ME should be dropped, because they are not scaled, so the authors are adding up error measures for bricks and nails. The result is meaningless.
MASE is not needed – it is minimised by median, which could be a serious issue on intermittent demand see this post. wMAPE has similar issues because it is also based on MAE.
If the point forecasts are produced in terms of medians (like in case of NBEATS), then RMSSE should be dropped, and MASE should be used instead.
But also, comparing means with medians is not a good idea. If you assume a symmetric distribution, the two should coincide, but in general this might not hold.
R2 is not a good measure of forecast accuracy. It makes some sense in regression context for linear models, but in this one, it is pointless, and only shows that the authors don’t fully understand what they are doing. Plus, it’s not clear how specifically it was calculated.
I don’t fully understand “demand error”, “demand bias” and other measures, and the authors do not explain them in necessary detail. This needs to be added to the paper.
The split into “Individual Groups” and “Whole Category” is not well explained either: it is not clear what this means, why, and how this was done.
And in general, I don’t understand what the authors want to do with Cases A – D in Table 3. It is not clear why they are needed, and what they want to show with them. This is not explained in the paper.

I have a series of posts on forecast evaluation here.

Invalid analysis of bias measures:

Analysis of bias measures is meaningless because they were not scaled.

Disturbing bias of NBEATS in Figure 2:

The bias shown in Figure 2 is disturbing and should be dealt with prior to evaluation. It could have appeared due to the loss function used for training or because the data was not pre-processed correctly. Leaving it as is and blaming NBEATS for this does not sound reasonable to me.

No inventory implications:

The authors mention inventory management, but stop on forecasting, not showing how the specific forecasts translate to inventory decisions. If this paper was to be submitted to any operations-related journal, the inventory implications would need to be added in the discussion.

Underexplained performance gaps:

The paper also does not explain well why neural networks performed worse than gradient boosting methods. They mention that this could be due to the effect of missing values, but this is a speculation rather than an explanation, which I personally do not believe (I might be wrong). While the overall results make sense for me personally, if you want to publish a good paper, you need to provide a more detailed answer to the question “why?”.

Minor issues

I also have three minor comments:

“many product series are censored” (page 2, last paragraph) is not what it sounds like. The authors imply that the histories are short, while the usual interpretation is that the sales are lower than the demand, so the values are censored. I would rewrite this.
Figure 2 has the legend saying “Poisson” three times, not providing any useful information. This is probably just a mistake, which can easily be fixed.
There are no references to Table 2 and Figure 3 in the paper. It is not clear why they are needed. Every table and figure should be referred to and explained.

Conclusions

Overall, the paper has a sensible idea, but I feel that the authors need to learn more about forecasting principles and that they have not read forecasting literature carefully to understand how specifically the experiments should be designed, what to do, and not to do (stop using SBC!). Because they made several serious mistakes, I feel that the results of the paper are compromised and might not be correct.

P.S. If I were a reviewer of this paper, I would recommend either “reject and resubmit” or a “major revision” (if the former option was not available).

P.P.S. If the authors of the paper are reading this, I hope you find these comments useful. If you have not submitted the paper yet, I’d suggest to take some of them (if not all) into account. Hopefully, this will smooth the submission process for you.

Message Review of a paper on comparison of modern machine learning techniques in retail first appeared on Open Forecasting.

NATCOR course on Forecasting and Predictive Analytics, September 2025

Ivan Svetunkov — Sun, 22 Jun 2025 13:24:00 +0000

Are there any PhD students in the crowd who want to learn more about forecasting? What about academic supervisors who have such students? Show me your hands! This post is for you!

This September, we (Lancaster Centre for Marketing Analytics and Forecasting members) will deliver the Natcor course on Forecasting and Predictive Analytics at Lancaster University, UK. I’ll be acting as the course leader and I’m currently working on the schedule, aiming to make it both engaging and useful for participants.

The current plan is for the course to be delivered by our team members: Dr. Sven F. Crone, Anna Sroginis, Kandrika Pritularga, Nicos Pavlidis, Anna-Lena Sachs, and myself. The course will run over 5 days.

As for the content, we’ll start by covering the fundamentals of forecasting and gradually move on to more advanced statistical and machine learning approaches for forecasting and predictive analytics. As this course is designed specifically for PhD students, we’ll dive deeper into the material than we typically do in our Master’s level and in executive training courses. It will also be supported by real-life cases, of which the centre has accumulated many over the years, and discuss how forecasts translate to specific decisions and can be connected to the other OR-related topics.

You can find more details about the course here.

And register here (Note that NATCOR offers other excellent courses as well!).

Please share this post with your network if you know any PhD students or supervisors who might be interested.

P.S. Image in the post is partially related to what we will discuss, but is mainly here just to attract your attention 😝

Message NATCOR course on Forecasting and Predictive Analytics, September 2025 first appeared on Open Forecasting.