Archives ETS - Open Forecasting

smooth in python: multiple seasonal ETS

Ivan Svetunkov — Mon, 11 May 2026 08:08:38 +0000

Another interesting case in demand forecasting is the high frequency data. For example, if you work with demand on daily level, you might notice that demand increases every Monday but also exhibits proper seasonal fluctuations (e.g. decline every Winter). What do you do in this case?

One of the solutions (old but gold) is the multiple seasonal ETS model, which was originally developed by James Taylor (2003) for the pure additive exponential smoothing. The idea was quite simple: to model multiple seasonal cycles, one can add multiple seasonal components, i.e. to capture the day-of-week (frequency 7) and the day-of-year (frequency 365) effects. While it worked fine for some examples, the main issue with it has been its computational speed (or rather slowness): the original ETS needs to estimate all smoothing parameters + all the initial values for seasonal indices and other components. Both ADAM and ES in the smooth package support multiple seasonalities and avoid the whole issue by using a different model initialisation called “backcasting”.

Here is a classical example from James’ paper on the half-hourly electricity demand (see the image in the post). It is clear that there is a half-hour-of-day and the day-of-week effects. In ES, this means that we need to provide the vector for the lags variable:

from smooth import ES
from fcompdata import taylor

# Fit ES with automatic ETS model selection
model = ES(lags=[48, 336], h=336, holdout=True)
model.fit(taylor.y)
model.predict(h=336)
print(model)

This is the output I get from the function:

Time elapsed: 2.03 seconds
Model estimated using ES() function: ETS(MNM)
With backcasting initialisation
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 25391.1773
Persistence vector g:
 alpha gamma1 gamma2
0.2899 0.1283 0.5270
Sample size: 3696
Number of estimated parameters: 4
Number of degrees of freedom: 3692
Information criteria:
      AIC      AICc       BIC      BICc
50790.3546 50790.3654 50815.2146 50815.2591

Forecast errors:
ME: 829.1195; MAE: 942.1447; RMSE: 1065.1127
sCE: 941.5012%; Asymmetry: 9.2%; sMAE: 3.1841%; sMSE: 0.1296%
MASE: 1.4491; RMSSE: 1.1286; rMAE: 0.1408; rRMSE: 0.1300

The computational time on this data was only 2.03 second. In this time, the function tried several possible ETS models and selected the best one based on the AICc value. The resulting best model is ETS(M,N,M), which makes perfect sense for this data.

Is there a way to improve this model? Yes! Taylor mentions that adding AR(1) to the cocktail tends to improve the accuracy in case of multiple seasonal series. We can try that if we switch to ADAM:

from smooth import ADAM

# Fit ADAM ETS(MNM)+AR(1) model
model = ADAM(model="MNM", ar_orders=1, lags=[48, 336], h=336, holdout=True)
model.fit(taylor.y)
print(model)
model.plot(7)

Here is the output:

Time elapsed: 1.04 seconds
Model estimated using ADAM() function: ETS(MNM)+ARIMA(1,0,0)
With backcasting initialisation
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 24157.2473
Persistence vector g:
 alpha gamma1 gamma2
0.1097 0.2225 0.3481
ARMA parameters of the model:
             Lag 1
AR(1)       0.6852
Sample size: 3696
Number of estimated parameters: 5
Number of degrees of freedom: 3691
Information criteria:
      AIC      AICc       BIC      BICc
48324.4947 48324.5109 48355.5697 48355.6365

Forecast errors:
ME: 276.4061; MAE: 462.5092; RMSE: 588.5957
sCE: 313.8711%; Asymmetry: 2.1%; sMAE: 1.5631%; sMSE: 0.0396%
MASE: 0.7114; RMSSE: 0.6237; rMAE: 0.0691; rRMSE: 0.0719

The resulting model has lower AICc, but also produces more accurate point forecasts (compare RMSSE values) for the holdout set. The following image shows the data and the point forecasts for it:

Double seasonal ETS(M,N,M) applied to the half-hourly electricity demand data

What else can we do here? Actually, quite a lot: multistep losses, seasonal ARIMA, explanatory variables – things can get only more complicated from here. Have a look at this.

Do I hear someone shouting “TBATS”? TBATS is the exponential smoothing with additional bells and whistles (ETS + adapted Fourier terms + ARMA errors). I don’t have it as a separate function in the smooth just yet, but you can reproduce it, for example, like this.

So, what are you waiting for? Dive in and see how it works for yourself!

Install smooth: pip install smooth

Message smooth in python: multiple seasonal ETS first appeared on Open Forecasting.

smooth in python: ETS with explanatory variables

Ivan Svetunkov — Tue, 05 May 2026 08:03:37 +0000

We continue our series of posts on the functions from the smooth package for Python/R. Today we will see how to enhance your exponential smoothing with explanatory variables. What? Yes, you heard me! Let’s dive in!

We all know that in real life sales don’t just evolve over time on their own. Any univariate model, such as ARIMA or ETS is just a way to approximate a complex reality. In practice, there are many factors affecting the demand for your product. What would happen if the price on your product increases? What if you run a promotion (e.g. “Buy One, Get One Free”)? Your competitor’s strategy impacts the demand for your product as well… There’s lots of different factors, and some of them can be quite useful in demand forecasting. But can we join the dynamic univariate models with regression?

Yes, we can! Although ETS is thought as a pure univariate model, it is easy to extend to include explanatory variables. There are several great papers showing how it works (e.g. Kourentzes & Petropoulos, 2016), and in fact the es() function from the smooth package for R was used as a benchmark in the M5 competition.

So, consider a situation where you have weekly sales of a product with some recorded promotions (encoded as dummy variables). We will use a time series from the fcompdata package for Python. The first image shows how the series looks, the vertical lines show when promotions happen. The series itself seems to be seasonal, roughly repeating peaks and troughs every 52 observations (every year). Also, we see that there are two types of promotions, and when they happen sales tend to increase. So, including them should improve the model fit, and if the company decides to run promotions again, the model will forecast demand better. I will start by fitting the ETS(M,N,M) to the data:

from smooth import ES
from fcompdata import PromoData

y = PromoData.y

model = ES(model="MNM", lags=52, holdout=True, h=13)
model.fit(y)
model.predict(h=13)
model.plot(7)

NOTE: PromoData has a specific structure with several attributes. PromoData.x contains the in-sample data, PromoData.xx has the holdout – this is consistent with the Mcomp package for R. The new features in python are:

PromoData.y – concatenated training and test sets,
PromoData.xregx – matrix of explanatory variables for the training set,
PromoData.xregxx – matrix of explanatory variables for the test set,
PromoData.xreg – the full (concatenated) matrix of explanatory variables.

The following image shows the model fit and the point forecasts from the ETS(M,N,M):

ETS(M,N,M) fit and forecast for the promotional data example

As expected, because the model does not take promotions into account, it fits the data as best as it can and produces forecasts that are oblivious of the potential external effects on sales. We can improve it by including the promotional dummies:

X_train = PromoData.xreg
X_test =  PromoData.xregxx

model = ES(model="MNM", lags=52, holdout=True, h=13)
model.fit(y, X_train)
model.predict(h=13, X=X_test)
model.plot(7)

ETS(M,N,M) with explanatory variables

The image above shows the fit and the point forecasts from the ETSX(M,N,M) model that now takes the promotions into account. This is quite an improvement in comparison with the previous one. Furthermore, if we can control when to have promotions and what types of promotions to run, we can change the values in the `X_test` matrix and see what demand to expected in that situation. So, this gives an analyst a tool for a more advanced sensitivity analysis.

Read more about the ETSX here.
Install smooth: pip install smooth
ETSX wiki on github.

Message smooth in python: ETS with explanatory variables first appeared on Open Forecasting.

smooth in python: ETS forecast combination

Ivan Svetunkov — Mon, 27 Apr 2026 08:01:30 +0000

Last time we saw how to do automated model selection using the ES function from the smooth package. Now I want to show how to produce combined forecasts from ETS.

Why bother?

There is a vast body of literature on forecast combinations (read this great review). The main idea is that you should not put all your eggs in one basket — the safer strategy is to combine forecasts from different models instead of selecting just one. Yes, it is more computationally expensive, but the trade-off is higher accuracy on average.

For ETS, a great solution was proposed by Stephan Kolassa in his 2011 paper: extract AIC values, calculate AIC weights (giving the highest weight to the best-performing model and lower ones to the rest), then combine the forecasts. The resulting forecasts tend to be more robust, because in practice it might be hard to tell the difference between, for example, ETS(M,A,M) and ETS(M,Md,M). So why choose one when you can have all? I implemented this mechanism in the smooth package for R years ago, and now it is also available in Python.

Here is how it works on an example using an M3 time series. I picked this specific one because it is seasonal, but the trend is not very well pronounced. The series is shown in the first image.

from smooth import ES
from fcompdata import M3

series = M3[1687]
y = series.y
freq = series.period

# Fit ETS models, combine forecasts
model = ES(model="CXC", lags=series.period, h=18, holdout=True)
model.fit(y)
model.predict(h=18)

The code above tells ES to fit all ETS models with additive and no trend (“X” in the middle), calculate AIC weights, produce forecasts from each one of them, and then combine them. The resulting point forecast is the weighted combination of the individual forecasts. If a prediction interval is required, the specific quantiles are combined directly (see the paper by Lichtendahl et al., 2013). This is inevitably slower than the default model selection mechanism, but is a safer approach. The point forecast and the prediction interval (grey lines) are shown in the attached image.

Note that the user can regulate the pool of combined models via the “model” parameter of the function. This wiki explains all the accepted options.

So why not go ahead and try it yourself, and see how it works for your data?

🔗 Install smooth: pip install smooth
📖 More on forecasts combination in ADAM.

Message smooth in python: ETS forecast combination first appeared on Open Forecasting.

smooth in python: ETS with model selection

Ivan Svetunkov — Wed, 22 Apr 2026 00:06:43 +0000

As some of you have heard, the smooth package is now on PyPI. So, I’ve decided to write a series of posts showcasing how some of its functions work. We start with the basics, ETS.

ETS stands for the “Error-Trend-Seasonal” model or ExponenTial Smoothing. It is a statistical model that relies on time series decomposition and updates the unobserved states (level/trend/seasonal) based on the mistakes it makes. In a way, you can call it an adaptive model that changes its forecast based on the most recent available information. It is relatively simple to explain and work with, and it has performed well in a variety of competitions (M3, M4, M5, for example).

The smooth package implements an advanced form of ETS in the ADAM and a more basic one in the ES classes. In fact, ES is just a wrapper of ADAM, it is the conventional model, with just some tuning. Both support all 30 ETS models, have automated model selection and forecast combination, allow producing point forecasts and a variety of prediction intervals types. In fact, if you want a straightforward robust implementation of ETS, give ES a try.

Here’s how to use it in Python:

from smooth import ES
from fcompdata import M3

# Pick a series from the M3 competition for demonstration
series = M3[2568]
y = series.x
freq = series.period

# Fit ES with the automatic model selection
model = ES(lags=freq, h=18, holdout=True)
model.fit(y)
print(model)

Running this produces output similar to this:

Time elapsed: 0.4 seconds
Model estimated using ES() function: ETS(MAM)
With backcasting initialisation
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 724.8524
Persistence vector g:
 alpha   beta  gamma
0.0065 0.0000 0.0000
Sample size: 98
Number of estimated parameters: 4
Number of degrees of freedom: 94
Information criteria:
      AIC      AICc       BIC      BICc
1457.7047 1458.1348 1468.0446 1469.0306

Forecast errors:
ME: -580.9985; MAE: 604.0204; RMSE: 710.5457
sCE: -149.9347%; Asymmetry: -2.5%; sMAE: 8.6598%; sMSE: 1.0378%
MASE: 0.2653; RMSSE: 0.2452; rMAE: 0.2555; rRMSE: 0.2163

A few things worth noting from the output:

ES automatically selected ETS(MAM) based on the AICc value – a multiplicative error, additive trend, multiplicative seasonality model – as the best fit
It used backcasting for the model initialisation (default), which speeds up the process and requires fewer parameters to estimate
It kept the last 18 observation for the holdout, produced autoforecasts for it and calculated several forecast errors. This is handy if you want to directly compare different smooth models on a time series.

But why are we here? We want to forecast! So, here it is:

model.predict(h=18, interval="prediction")
model.plot(7)

This should produce an image similar to the one attached to the post. As simple as that.

Now it’s your turn! :)

🔗 Install smooth: pip install smooth
📖 smooth wiki

Message smooth in python: ETS with model selection first appeared on Open Forecasting.

smooth forecasting with the smooth package in Python

Ivan Svetunkov — Thu, 09 Apr 2026 09:15:02 +0000

Here is another piece of news I have been hoping to deliver for quite some time now (since January 2026 actually). We have finally created the first release of the smooth package for Python and it is available on PyPI! Anyone interested? Read more!

On this page:

Why does “smooth” exist?
A bit of history
How to install
What works
An example
Evaluation

Setup
Jupiter notebook
sktime non-collaborative stance
Results

What’s next?
Summary

Why does “smooth” exist?

There are lots of implementations of ETS and ARIMA (dynamic models) out there, both in Python and in R (and also in Julia now, see Durbyn). So, why bother creating yet another one?

The main philosophy of the smooth package in R is flexibility. It is not just an implementation of ETS or ARIMA – it is there to give you more control over what you can do with these models in different situations. The main function in the package is called “ADAM” – the Augmented Dynamic Adaptive Model. It is the single source of error state space model that unites ETS, ARIMA, and regression, and supports the following list of features (taken from the Introduction in the book):

ETS;
ARIMA;
Regression;
TVP regression;
Combination of (1), (2), and either (3) or (4). i.e. ARIMAX/ETSX;
Automatic selection/combination of states for ETS;
Automatic orders selection for ARIMA;
Variables selection for regression;
Normal and non-normal distributions;
Automatic selection of most suitable distributions;
Multiple seasonality;
Occurrence part of the model to handle zeroes in data (intermittent demand);
Modelling scale of distribution (GARCH and beyond);
A variety of ways to produce forecasts for different situation;
Advanced loss functions for model estimation;
…

All of these features come with the ability to fine tune the optimiser (e.g. how the parameters are estimated) and to manually adjust any model parameters you want. This allows, for example, fitting iETSX(M,N,M) with multiple frequencies with Gamma distribution to better model hourly emergency department arrivals (intermittent demand) and thus producing more accurate forecasts of it. There is no other package either in R or Python that could give such flexibility with the dynamic models to users.

And at the same time, over the years, I have managed to iron out the R functions so much that they handle almost any real life situation and do not break. And at the same time, they work quite fast and produce accurate forecasts, sometimes outperforming the other existing R implementations.

So, here we are. We want to bring this flexibility, robustness, and speed to Python.

A bit of history

The smooth package for R was first released on CRAN in 2016, when I finished my PhD. It went from v1.4.3 to v4.4.0 over the last 10 years. It saw a rise in popularity, but then an inevitable decline due to the decreasing number of R users in business. So, back in 2021, Rebecca Killick and I applied for an EPSRC grant to develop Python packages for forecasting and time series analysis. The idea was to translate what we have in R (including greybox, smooth, forecast, and changepoint detection packages) to Python with the help of professional programmers. Unfortunately, we did not receive the funding (it went to sktime for a good reason – they already had an existing codebase in Python).

In the beginning of 2023, Leonidas Tsaprounis got in touch with me, suggesting some help with the development and translation of the smooth package to Python. The idea was to use the existing C++ core and simply create a wrapper in Python. “Simply” is actually an oversimplification here, because I am not a programmer, so my functions are messy and hard to read. Nonetheless, we started cooking. Leo helped in setting up pybind11 and carma, and creating the necessary files for the compilation of the C++ code. Just to test whether that worked, we managed to create a basic function for the simple moving average based on the sma() from the R version. Our progress was a bit slow, because we were both busy with other projects. All of that changed when in July 2023 Filotas Theodosiou joined our small team and started working on the translation. We decided to implement the hardest thing first – ADAM.

What Fil did was use LLMs to translate the code from R to Python. Fil will write about this work at some point; the only thing I can say here is that it was not an easy process, because thousands of lines of R code needed to be translated to Python and then refactored. I only helped with suggestions and explanations of what is happening inside, and Leo provided guidance regarding the coding philosophy. It was Fil and his AI tools that did the main heavy lifting. By Summer 2025, we had a basic working ADAM() function, but it worked slightly differently from the one in R due to differences in initialisation and optimisation. He presented his work at the IIF Open Source Forecasting Software Workshop, explaining his experience with LLMs and code translation. Because everyone was pretty busy, it took a bit more time to reach the first proper release of smooth in Python.

In December 2025, I bought a Claude AI subscription and started vibe coding my way through the existing Python code. Between the three of us, we managed to progress the project, and finally in January 2026 we reached v1.0.0 of smooth in Python. Now ADAM works in exactly the same way in both R and Python: if you give it the same time series, it will select the same model and produce the same parameter estimates across languages. It took us time and effort to reach this, but we feel it is a critically important step – ensuring that users working in different languages have the same experience.

How to install

We are entirely grateful to Gustavo Niemeyer, who gave us the name smooth on PyPI. It belonged to him since 2020, but the project was abandoned, and he agreed to transfer it to us. So, now you can install smooth simply by running:

pip install smooth

There is also a development version of the package, which you can install by following the instructions in our Installation wiki on GitHub.

What works

The package currently does not have the full functionality, but there are already some things:

ADAM() – the main function that supports ETS with:
- components selection via model="ZXZ" or other pools (see wiki on GitHub for details);
- forecast combination using AIC weights via model="CCC" or other pools (again, explained in the wiki);
- multiple seasonalities via lags=[24,168];
- ability to provide some smoothing parameters or initial values (e.g. only alpha), letting the function estimate the rest;
- different distributions;
- advanced loss functions;
- several options for model initialisation;
- fine-tuning of the optimiser via nlopt_kargs (read more in the wiki);
ES() – the wrapper of ADAM() with the normal distribution. Supports the same functionality, but is a simplification of ADAM.

We also have the standard methods for fitting and forecasting, and many attributes that allow extracting information from the model, all of which are explained in the wiki of the project.

An example

Here is an example of how to work with ADAM in Python. For this example to work, you will need to install the fcompdata package from pip:
pip install fcompdata

The example:

from smooth import ADAM
from fcompdata import M3

model = ADAM(model="ZXZ", lags=12)
model.fit(M3[2568].x)

print(model)

This is what you should see as the result of print:

Time elapsed: 0.21 seconds
Model estimated using ADAM() function: ETS(MAM)
With backcasting initialisation
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 868.7085
Persistence vector g:
 alpha   beta  gamma
0.0205 0.0203 0.1568
Damping parameter: 1.0000
Sample size: 116
Number of estimated parameters: 4
Number of degrees of freedom: 112
Information criteria:
      AIC      AICc       BIC      BICc
1745.4170 1745.7774 1756.4314 1757.2879

which is exactly the same output as in R (see, for example, some explanations here). We can then produce a forecast from this model:

predict(model, h=18, interval="prediction", level=[0.9,0.95])

The predict method currently supports analytical (aka “parametric”/”approximate”) and simulated prediction intervals. The interval="prediction" will tell the function to choose between the two depending on the type of model (multiplicative ETS models do not have analytical formulae for the multistep conditional variance and, as a result, do not have proper analytical prediction intervals). The level parameter can accept either a vector (which will produce several quantiles) or a scalar. What I get after running this is:

             mean    lower_0.05   lower_0.025    upper_0.95   upper_0.975
116  11234.592643  10061.150811   9853.119370  12443.904763  12686.143870
117   8050.810544   7228.709864   7064.356429   8896.078713   9080.867866
118   7658.608163   6886.165498   6746.881596   8475.545469   8633.149796
119  10552.382933   9452.306261   9236.493206  11679.816814  11892.381046
120  10889.816551   9768.665327   9559.580233  12066.628218  12313.872205
121   7409.545388   6643.378080   6495.237349   8232.558913   8363.550598
122   7591.183726   6800.878319   6650.514904   8425.167556   8576.257297
123  14648.452226  13089.346824  12793.226771  16263.206997  16582.786939
124   6953.045603   6206.829418   6079.126523   7730.260301   7892.917098
125  11938.941882  10650.759513  10427.172989  13307.563498  13579.662925
126   8299.626845   7379.550080   7200.075336   9280.095498   9468.327654
127   8508.558884   7530.987557   7367.541698   9534.117611   9734.218257
128  11552.654541  10162.615284   9907.770755  13039.710057  13313.534412
129   8286.727505   7273.279560   7100.450038   9401.374461   9627.922116
130   7889.999721   6879.771040   6710.494741   8948.052817   9186.279622
131  10860.671447   9438.536335   9174.365147  12353.444279  12664.983138
132  11218.330395   9690.780430   9453.114931  12849.545829  13201.679610
133   7620.922782   6564.497975   6391.080974   8748.561119   8977.120673

The separate wiki on Fitted Values and Forecasts explains all the parameters accepted by the predict method and what is returned by it.

Evaluation

To see how the developed function works, I decided to conduct exactly the same evaluation that I did for the recent R release of the smooth package, running the functions on the M1, M3, and Tourism competition data (5,315 time series) using the fcompdata package.

Setup

I have selected the same set of models for Python as I did in R. Here are several options for the ADAM model parameter to see how the specific pools impact accuracy (this is discussed in detail in Section 15.1 of ADAM):

XXX – select between pure additive ETS models only;
ZZZ – select from the pool of all 30 models, but use branch-and-bound to remove the less suitable models;
ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
FFF – select from the pool of all 30 models (exhaustive search);
SXS – the pool of models used by default in ets() from the forecast package in R.

I also tested three types of ETS initialisation (read more about them here):

Back – initial="backcasting" – this is the default initialisation method;
Opt – initial="optimal";
Two – initial="two-stage".

I have also found the following implementations of ETS in Python and included them in my evaluation:

There is also a darts implementation of AutoETS, which is actually a wrapper of the statsforecast one. So I ran it just to check how it works, and found that it failed in 1,518 cases. I filed the issue, and it turned out that their implementation does not deal with short time series (10 observations or fewer), which is their design decision. They are now considering what to do about that, if anything.

I used RMSSE (M5 competition, motivated by Athanasopoulos & Kourentzes (2023)) and SAME error measure together with the computational time for each time series:
\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.

\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).

All of this was implemented in a Jupyter notebook, which is available here in case you want to reproduce the results.

sktime non-collaborative stance

In the first run of this (on 28th January 2026), I encountered several errors in AutoETS in sktime: it took an extremely long time to compute (see the table below – on average around 30 seconds per time series) and produced ridiculous forecasts (mean RMSSE was 106,951). I filed an issue in their repo and sent a courtesy message to Franz Kiraly on LinkedIn the same day, saying that I would be happy to rerun the results if this was fixed. I then received an insulting email from him, blaming me for not collaborating and trying to diminish sktime. They then closed the issue, claiming that it was fixed. I reran the experiment with their development version from GitHub on 21st March (two months later!), only to get exactly the same results. I do not think their fix is working, but given Franz’s toxic behaviour, I am not going to rerun this any further or help him in any way. The Jupyter notebook with the experiment is here, so he can investigate on his own.

Results

So, here are the summary results for the tested models:

====================================================================================
EVALUATION RESULTS for RMSSE (All Series)
====================================================================================
               Method       Min        Q1       Med        Q3       Max      Mean
        ADAM ETS Back  0.018252  0.663358  1.161473  2.301861  50.25854  1.928086
         ADAM ETS Opt  0.024155  0.670682  1.185932  2.365498  51.61599  1.943729
         ADAM ETS Two  0.024599  0.669522  1.182516  2.342385  51.61599  1.947715
              ES Back  0.018252  0.667225  1.160971  2.313932  50.25854  1.927436
               ES Opt  0.024155  0.673575  1.185756  2.364915  51.61599  1.947180
               ES Two  0.024467  0.671771  1.187368  2.346343  51.61599  1.955076
               ES XXX  0.018252  0.677717  1.170823  2.306197  50.25854  1.961318
               ES ZZZ  0.011386  0.670211  1.179916  2.353334  115.5442  2.053459
               ES FFF  0.011386  0.680956  1.211736  2.449626  115.5442  2.100899
               ES SXS  0.018252  0.674537  1.169187  2.353334  50.25854  1.939847
statsforecast AutoETS  0.024468  0.673157  1.189209  2.326650  51.61597  1.923925
   skforecast AutoETS  0.074744  0.747200  1.344916  2.721083  50.54339  2.273724
       sktime AutoETS  0.024467  0.676191  1.190093  2.456184 565753200  106951.7

Things to note:

The best performing ETS on average is from Nixtla’s statsforecast package. The second best is our implementation (ADAM/ES) with backcasting;
Given that I ran exactly the same experiment for the R packages, we can conclude that Nixtla’s implementation is even better than the one in the forecast package in R;
In terms of median RMSSE, ES with backcasting outperforms all other implementations;
ADAM ETS is the best in terms of the first and third quartiles of RMSSE;
ADAM ETS and ES perform quite similarly. This is expected, because ES is a wrapper of ADAM ETS, which assumes normality for the error term. ADAM ETS switches between Normal and Gamma distributions based on the type of error term;
Backcasting leads to the most accurate forecasts on these datasets. This does not mean it is a universal rule, and I am sure the situation will change for other datasets;
ES XXX gives exactly the same results as the one implemented in the R version of the package. This is important because we were aiming to reproduce results between R and Python with 100% precision, and we did. The reason why other ETS flavours differ between R and Python is that since smooth 4.4.0 for R, we changed how point forecasts are calculated for multiplicative component models: previously, they relied on simulations; now we simply use point forecasts. While this is not entirely statistically accurate, it is pragmatic because it avoids explosive trajectories.

The results for SAME are qualitatively similar to those for RMSSE:

===================================================================================
EVALUATION RESULTS for SAME (All Series)
===================================================================================
               Method       Min        Q1       Med        Q3       Max      Mean
        ADAM ETS Back  0.001142  0.374466  0.995272  2.402342  52.34177  1.951070
         ADAM ETS Opt  0.000106  0.373551  1.021661  2.485596  55.10179  1.962040
         ADAM ETS Two  0.000782  0.380398  1.029629  2.451008  55.10186  1.970422
              ES Back  0.001142  0.372777  0.994547  2.412503  53.45041  1.946517
               ES Opt  0.000217  0.372725  1.024666  2.478435  54.68603  1.967217
               ES Two  0.000095  0.384795  1.028561  2.454543  54.68558  1.982261
               ES XXX  0.000094  0.373315  1.005006  2.425682  53.16973  1.992656
               ES ZZZ  0.000760  0.386673  1.017732  2.467912  145.7604  2.107522
               ES FFF  0.000597  0.401426  1.048395  2.566151  145.7604  2.173559
               ES SXS  0.000597  0.375438  1.005603  2.450490  53.45041  1.964322
statsforecast AutoETS  0.000228  0.374821  1.015205  2.434952  53.61359  1.938682
   skforecast AutoETS  0.000993  0.457066  1.217751  2.954003  59.84443  2.409900
       sktime AutoETS  0.000433  0.392802  1.029433  2.571555 385286900  72931.90

Finally, I also measured the computational time and got the following summary. Note that I moved ES flavours below because they are not directly comparable with the others (they have special pools of models):

================================================================================
EVALUATION RESULTS for Computational Time in seconds (All Series)
================================================================================
               Method       Min        Q1       Med        Q3        Max      Mean
        ADAM ETS Back  0.008679  0.086219  0.140929  0.241768   0.923601  0.181546
         ADAM ETS Opt  0.051302  0.229400  0.315379  0.792827   2.638193  0.550378
         ADAM ETS Two  0.051314  0.287382  0.455149  1.080276   3.535120  0.715090
              ES Back  0.009274  0.085299  0.139511  0.247114   0.958868  0.182547
               ES Opt  0.053176  0.224293  0.312200  0.772161   2.888662  0.541243
               ES Two  0.048539  0.279598  0.446404  1.058847   3.553156  0.703183
statsforecast AutoETS  0.001770  0.007702  0.081189  0.167040   1.271202  0.102575
   skforecast AutoETS  0.021553  0.243102  0.302078  1.478667   7.482101  0.820576
       sktime AutoETS  0.128021  6.227344 19.170921 41.494513 229.793067 30.712191

================================================================================
               ES XXX  0.008793  0.054139  0.093792  0.144012   0.480976  0.104800
               ES ZZZ  0.010502  0.127297  0.184561  0.447649   1.794031  0.313157
               ES FFF  0.046921  0.215119  1.110657  1.557724   3.489375  1.071004
               ES SXS  0.024909  0.116427  0.434594  0.564973   1.251257  0.403988

Things to note:

Nixtla’s implementation is actually the fastest and very hard to beat. As far as I understand, they did great work implementing some code in C++ and then using numba (I do not know what that means yet);
Our implementation does not use numba, but our ETS with backcasting is still faster than the skforecast and sktime implementations;
ADAM ETS with backcasting also has the lowest maximum time, implying that in difficult situations it finds a solution relatively quickly compared with the others;

So, overall, I would argue that the smooth implementation of ETS is competitive with other implementations. But it has one important benefit: it supports more features. And we plan to expand it further to make it even more useful across a wider variety of cases.

What’s next

There are still a lot of features that we have not managed to implement yet. Here is a non-exhaustive list:

Explanatory variables to have ETSX/ARIMAX;
ARIMA;
Occurrence model for intermittent demand forecasting;
Scale model for ADAM;
Simulation functions;
Model diagnostics;
CES, MSARIMA, SSARIMA, GUM, and SMA – functions that are available in R and not yet ported to Python.

So, lots of work to do. I am sure we will be quite busy well into 2026.

Summary

It has been a long and winding road, but Filotas and Leo did an amazing job to make this happen. The existing ETS implementation in smooth already works quite well and quite fast. It does not fail as some other implementations do, and it is quite reliable. I have actually spent many years testing the R version on different time series to make sure that it produces something sensible no matter what. The code was translated to Python one-to-one, so I am fairly confident that the function will work as expected in 99.9% of cases (there is always a non-zero probability that something will go wrong). Both ADAM and ES already support a variety of features that you might find useful.

One thing I will kindly ask of you is that if you find a bug or an issue when running experiments on your datasets, please file it in our GitHub repo here – we will try to find the time to fix it. Also, if you would like to contribute by translating some features from R to Python or implementing something additional, please get in touch with me.

Finally, I am always glad to hear success stories. If you find the smooth package useful in your work, please let us know. One way of doing that is via the Discussions on GitHub, or you can simply send me an email.

Message smooth forecasting with the smooth package in Python first appeared on Open Forecasting.

Detecting patterns in white noise

Ivan Svetunkov — Wed, 10 Apr 2024 08:16:58 +0000

Back in 2015, when I was working on my paper on Complex Exponential Smoothing, I conducted a simple simulation experiment to check how ARIMA and ETS select components/orders in time series. And I found something interesting…

One of the important steps in forecasting with statistical models is identifying the existing structure. In the case of ETS, it comes to selecting trend/seasonal components, while for ARIMA, it’s about order selection. In R, several functions automatically handle this based on information criteria (Hyndman & Khandakar, 2006; Svetunkov & Boylan (2017); Chapter 15 of ADAM). I decided to investigate how this mechanism works.

I generated data from the Normal distribution with a fixed mean of 5000 and a standard deviation of 50. Then, I asked ETS and ARIMA (from the forecast package in R) to automatically select the appropriate model for each of 1000 time series. Here is the R code for this simple experiment:

Some R code

# Set random seed for reproducibility
set.seed(41, kind="L'Ecuyer-CMRG")
# Number of iterations
nsim <- 1000
# Number of observations
obsAll <- 120
# Generate data from N(5000, 50)
rnorm(nsim*obsAll, 5000, 50) |>
  matrix(obsAll, nsim) |>
  ts(frequency=12) -> x

# Load forecast package
library(forecast)
# Load doMC for parallel calculations
# doMC is only available on Linux and Max
# Use library(doParallel) on Windows
library(doMC)
registerDoMC(detectCores())

# A loop for ARIMA, recording the orders
matArima <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- auto.arima(x[,i])
    # The element number 5 is just m, period of seasonality
    return(c(testModel$arma[-5],(!is.na(testModel$coef["drift"]))*1))
}
rownames(matArima) <- c("AR","MA","SAR","SMA","I","SI","Drift")

# A loop for ETS, recording the model types
matEts <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- ets(x[,i], allow.multiplicative.trend=TRUE)
    return(testModel[13]$method)
}

The findings of this experiment are summarised using the following chunk of the R code:

R code for the analysis of the results

#### Auto ARIMA ####
# Non-seasonal ARIMA elements
mean(apply(matArima[c("AR","MA","I","Drift"),]!=0, 2, any))
# Seasonal ARIMA elements
mean(apply(matArima[c("SAR","SMA","SI"),]!=0, 2, any))

#### ETS ####
# Trend in ETS
mean(substr(matEts,7,7)!="N")
# Seasonality in ETS
mean(substr(matEts,nchar(matEts)-1,nchar(matEts)-1)!="N")

I summarised them in the following table:

	ARIMA	ETS
Non-seasonal elements	24.8%	2.3%
Seasonal elements	18.0%	0.2%
Any type of structure	37.9%	2.4%

So, ARIMA detected some structure (had non-zero orders) in almost 40% of all time series, even though the data was designed to have no structure (just white noise). It also captured non-seasonal orders in a quarter of the series and identified seasonality in 18% of them. ETS performed better (only 0.2% of seasonal models identified on the white noise), but still captured trends in 2.3% of cases.

Does this simple experiment suggest that ARIMA is a bad model and ETS is a good one? No, it does not. It simply demonstrates that ARIMA tends to overfit the data if allowed to select whatever it wants. How can we fix that?

My solution: restrict the pool of ARIMA models to check, preventing it from going crazy. My personal pool includes ARIMA(0,1,1), (1,1,2), (0,2,2), along with the seasonal orders of (0,1,1), (1,1,2), and (0,2,2), and combinations between them. This approach is motivated by the connection between ARIMA and ETS. Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included.

This algorithm can be written in the following simple function that uses msarima() function from the smooth package in R (note that the reason why this function is used is because all ARIMA models implemented in the function are directly comparable via information criteria):

R code for the compact ARIMA function

arimaCompact <- function(y, lags=c(1,frequency(y)), ic=c("AICc","AIC","BIC","BICc"), ...){

    # Start measuring the time of calculations
    startTime <- Sys.time();

    # If there are no lags for the basic components, correct this.
    if(sum(lags==1)==0){
        lags <- c(1,lags);
    }

    orderLength <- length(lags);
    ic <- match.arg(ic);
    IC <- switch(ic,
                 "AIC"=AIC,
                 "AICc"=AICc,
                 "BIC"=BIC,
                 "BICc"=BICc);

    # We consider the following list of models:
    # ARIMA(0,1,1), (1,1,2), (0,2,2),
    # ARIMA(0,0,0)+c, ARIMA(0,1,1)+c,
    # seasonal orders (0,1,1), (1,1,2), (0,2,2)
    # And all combinations between seasonal and non-seasonal parts
    # 
    # Encode all non-seasonal parts
    nNonSeasonal <- 5
    arimaNonSeasonal <- matrix(c(0,1,1,0, 1,1,2,0, 0,2,2,0, 0,0,0,1, 0,1,1,1), nNonSeasonal,4,
                               dimnames=list(NULL, c("ar","i","ma","const")), byrow=TRUE)
    # Encode all seasonal parts ()
    nSeasonal <- 4
    arimaSeasonal <- matrix(c(0,0,0, 0,1,1, 1,1,2, 0,2,2), nSeasonal,3,
                               dimnames=list(NULL, c("sar","si","sma")), byrow=TRUE)

    # Check all the models in the pool
    testModels <- vector("list", nSeasonal*nNonSeasonal);
    m <- 1;
    for(i in 1:nSeasonal){
        for(j in 1:nNonSeasonal){
            testModels[[m]] <- msarima(y, orders=list(ar=c(arimaNonSeasonal[j,1],arimaSeasonal[i,1]),
                                                      i=c(arimaNonSeasonal[j,2],arimaSeasonal[i,2]),
                                                      ma=c(arimaNonSeasonal[j,3],arimaSeasonal[i,3])),
                                       constant=arimaNonSeasonal[j,4]==1, lags=lags, ...);
            m[] <- m+1;
        }
    }

    # Find the best one
    m <- which.min(sapply(testModels, IC));
    # Amend computational time
    testModels[[m]]$timeElapsed <- Sys.time()-startTime;

    return(testModels[[m]]);
}

Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included. I have not added that part in the code above. Still, this algorithm brings some improvements:

R code for the application of compact ARIMA to the data

#### Load the smooth package
library(smooth)

# A loop for the compact ARIMA, recording the orders
matArimaCompact <- foreach(i=1:nsim, .packages=c("smooth")) %dopar% {
    testModel <- arimaCompact(x[,i])
    return(orders(testModel))
}

#### Auto MSARIMA from smooth ####
# Non-seasonal ARIMA elements
mean(sapply(sapply(matArimaCompact, "[[", "ar"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "i"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "ma"), function(x){x[1]!=0}))

# Seasonal ARIMA elements
mean(sapply(sapply(matArimaSmooth, "[[", "ar"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "i"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "ma"), function(x){length(x)==2 && (x[2]!=0)}))

In my case, it resulted in the following:

	ARIMA	ETS	Compact ARIMA
Non-seasonal elements	24.8%	2.3%	2.4%
Seasonal elements	18.0%	0.2%	0.0%
Any type of structure	37.9%	2.4%	2.4%

As we see, when we impose restrictions on order selection in ARIMA, it avoids fitting seasonal models to non-seasonal data. While it still makes minor mistakes in terms of non-seasonal structure, it's nothing compared to the conventional approach. What about accuracy? I don't know. I'll have to write another post on this :).

Note that the models were applied to samples of 120 observations, which is considered "small" in statistics, while in real life is sometimes a luxury to have...

Message Detecting patterns in white noise first appeared on Open Forecasting.

Why you should not use Holt-Winters method

Ivan Svetunkov — Thu, 07 Mar 2024 11:52:30 +0000

Whenever I see results of an experiment that include Holt-Winters method, I shrug. You should not use it, and here is why.

Holt-Winters was developed in 1960 by a student of Charles Holt, Peter Winters (Winters, 1960). He extended Holt’s exponential smoothing method (the method that introduced a trend component) to include a seasonal component. The method has performed well in many situations, but it was originally developed for a specific type of data: trend-seasonal. Furthermore, the original method implied that the noise had an additive form, while the seasonality was multiplicative (which is a weird combination).

Since then, lots of things have happened in forecasting, one of the biggest being the development of the ETS framework by Hyndman et al. (2008). That framework covers 30 possible models for time series with different types of Error, Trend, and Seasonal components. Holt-Winters is a method that aligns with only one of the models from the framework.

Now, if you have data without a trend, we know that the trend-based models will perform poorly, overfitting the data. Similarly, with seasonal models on non-seasonal data or other misalignments between the model and the data. This is a “horses for courses” sort of thing: you should use the model that suits your data, not the one you can easily find in a default package. And this means that Holt-Winters works as intended on one out of roughly 30 possible types of time series. Sticking only to it is similar to applying SARIMA(0,2,2)(0,2,2) to all your data, and then wondering why it performed poorly.

Interestingly enough, data scientists that do not know forecasting very well typically compare their ML approach with ARIMA with automatic order selection (usually the one developed by Hyndman & Khandakar, 2008) and explicitly with Holt-Winters instead of trying ETS. I don’t know why they do that. The only explanation I have is that they probably just are not aware of the more modern approach to exponential smoothing.

If you want to learn more about exponential smoothing, my book is available online, and Kandrika Pritularga and I will deliver a workshop on that topic at ISF.

Example of application of Holt-Winters method on Air Passengers data

Correction: Rob Hyndman correctly pointed out that “the seasonal method now known as Holt-Winters was actually developed by Holt, and is described in Holt (1957)“, not as I stated above in Winters (1960). Winters (1960) was the published paper describing the method, while Holt (1957) was an unpublished report for the Logistics Branch of the Office of Naval Research (ONR).

Message Why you should not use Holt-Winters method first appeared on Open Forecasting.

Staying Positive: Challenges and Solutions in Using Pure Multiplicative ETS Models

Ivan Svetunkov — Wed, 10 Jan 2024 13:34:51 +0000

Authors: Ivan Svetunkov, John E. Boylan

Journal: IMA Journal of Management Mathematics

Abstract: Exponential smoothing in state space form (ETS) is a popular forecasting technique, widely used in research and practice. While the additive error ETS models have been well studied, the multiplicative error ones have received much less attention in forecasting literature. Still, these models can be useful in cases, when one deals with positive data, because they are supposed to work in such situations. Unfortunately, the classical assumption of normality for the error term might break this property and lead to non-positive forecasts on positive data. In order to address this issue we propose using Log-Normal, Gamma and Inverse Gaussian distributions, which are defined for positive values only. We demonstrate what happens with ETS(M,*,*) models in this case, discuss conditional moments of ETS with these distribution and show that they are more natural for the models than the Normal one. We conduct the simulation experiments in order to study the bias introduced by point forecasts in these models and then compare the models with different distributions. We finish the paper with an example of application, showing how pure multiplicative ETS with a positive distribution works.

DOI: 10.1093/imaman/dpad028.

Working paper.

About the paper

DISCLAIMER: This is quite a technical paper focusing on solving a small problem of the ETS model that would allow using it in specific non-standard situations. It acts as a building block for the iETS paper. But the latter does not work without this paper, so while it seems small, it is an important brick in the wall.

The conventional ETS works great for regular demand, where the volume of the data is high. In that case, a forecaster can decide which of the 30 models to select for the data, not worrying too much about the assumption of normality for the error term and about forecast trajectories from the selected model. The situation changes when one needs to work with the positive low volume data. One would think that pure multiplicative ETS should work fine in that case, however, due to the normality assumption, the model might produce negative prediction intervals and in some situations even point forecasts. Trying to fix this issue, we considered several distributions for the error term in the multiplicative error ETS:

\( 1 + \epsilon_t \sim \mathcal{N}\left(1, \sigma^2\right) \) – the conventional assumption of Normality;
\( 1 + \epsilon_t \sim \mathcal{IG}\left(1, \sigma^2\right) \) – the error term follows the Inverse Gaussian distribution with the expectation of one and the variance of \(\sigma^2\);
\( 1 + \epsilon_t \sim \mathrm{log}\mathcal{N} \left(-\frac{\sigma^2}{2}, \sigma^2 \right) \) – the error term follows the Log-Normal distribution with the location of \(-\frac{\sigma^2}{2}\) and the scale of \( \sigma^2 \);
\( 1 + \epsilon_t \sim \Gamma\left(\sigma^{-2}, \sigma^2\right) \) – the error term follows the Gamma distribution with the shape parameter \(\sigma^{-2}\) and the scale \( \sigma^2 \).

The restrictions imposed on the parameters of distributions above are necessary to make sure that the expectation of the error term \(1 + \epsilon_t \) is zero. If it isn’t then the ETS model would need to be modified to cater for the non-zero mean, otherwise the model will produce incorrect forecasts.

In the paper, we show how ETS works with these assumptions, what forecasting trajectories it produces and how it can be estimated. We also demonstrate that the distribution selection can be easily automated using AIC. All these aspects of the model are already implemented and supported in the adam() function from the smooth package in R (read more here and here).

Story of the paper

John Boylan and I started working on this paper after getting a rejection from the IJF for the other paper of ours, “iETS: State space model for intermittent demand forecasting“. The rejection showed us that we need to take a completely new look at the paper, and it became apparent that the pure multiplicative ETS is not well studied in the literature. At the same time, its discussion would be outside of the scope of the original paper, so, we decided to write a separate one, focusing on the non-intermittent, but low volume demand.

We needed to discuss two points in the paper, which were then used in the iETS one:

The conventional ETS assumes that the demand follows Normal distribution. In case of low volume demand this assumption may lead to negative forecasts, which makes the model inappropriate;
Point forecasts from multiplicative ETS models does not coincide with the conditional expectations. Hyndman et al. (2008) discuss this in their book, but not to the extent we needed. We thought that lots of people do not understand the implications of this, so we added that discussion to the paper.

The paper was written over the period of 2021 – 2023 and was ready in Spring 2023. John and I discussed it several times, and we agreed to have a final look at it in May 2023 before submitting it to IMA Journal of Management Mathematics. When I found out that John was ill, I decided not to wait for his comments further and just submitted it. The paper went through a couple of rounds, changed its name to reflect concerns of one of reviewers (the new name is objectively better than the old one) and was accepted for publication in November 2023. This is the last paper that John and I wrote together.

Message Staying Positive: Challenges and Solutions in Using Pure Multiplicative ETS Models first appeared on Open Forecasting.

Why you should care about Exponential Smoothing

Ivan Svetunkov — Wed, 10 Jan 2024 09:44:03 +0000

On 15th December 2023, I presented in a CMAF Friday Forecasting Talks webinar on the topic of “Why you should care about exponential smoothing”. The motivation was to give a fresh view on the good old model and show how it started, how it evolved over time and how it can be improved. With this presentation, I tried to explain why Exponential Smoothing is still attractive in real life. The main conclusions are the following:

There has been a huge progress in the area of Exponential Smoothing for the last 40 years. This includes development of state space Single Source of Error model by Ralph Snyder, Keith Ord, Anne Koehler and Rob Hyndman, which is well summarised in the book of Hyndman et al. (2008). This also includes development of TBATS by de Livera et al. (2011), MAPA by Kourentzes et al. (2014) and many other things, including some parts from my monograph on ADAM;
No, Exponential Smoothing is not a special case of ARIMA. This is discussed, for example, here and here;
Yes, Exponential Smoothing can handle external information, ETSX works fine and can be used efficiently in practice. This was shown, for example, by Kourentzes & Petropoulos (2016), Ramos et al. (2023) and even in M5 competition (ETSX did better than the plain ETS by roughly 6%);
The modern Exponential Smoothing can handle intermittent demand and/or multiple frequencies. It can be estimated using multistep losses and regularisation (see Pritularga et al, 2023);
If you decide to use Exponential Smoothing you should use the modern form of it. Do not ignore all the hard work of Hyndman et al. (2008) and related research. So, you should not use this formulation:

Outdated formulation of exponential smoothing
It is outdated and shows that you have completely ignored all the developments in the area since 1985 (side note: when reviewing papers, if I see that authors use this formulation, I automatically flag the paper as a major revision). You should use the modern approach instead, State Space Single Source of Error model, i.e. this formulation:
\begin{equation*}
\begin{aligned}
& {y}_{t} = l_{t-1} + b_{t-1} + s_{t-m} + \epsilon_t \\
& l_t = l_{t-1} + b_{t-1} + \alpha \epsilon_t \\
& b_t = b_{t-1} + \beta \epsilon_t \\
& s_t = s_{t-m} + \gamma \epsilon_t
\end{aligned} .
\end{equation*}

If you see someone using the old formulation, know that they do not know the state-of-the-art forecasting.

Here are the slides of the presentation.

And here is the recording of the webinar:

Message Why you should care about Exponential Smoothing first appeared on Open Forecasting.

iETS: State space model for intermittent demand forecasting

Ivan Svetunkov — Fri, 08 Sep 2023 09:30:40 +0000

Authors: Ivan Svetunkov, John E. Boylan

Journal: International Journal of Production Economics

Abstract: Inventory decisions relating to items that are demanded intermittently are particularly challenging. Decisions relating to termination of sales of product often rely on point estimates of the mean demand, whereas replenishment decisions depend on quantiles from interval estimates. It is in this context that modelling intermittent demand becomes an important task. In previous research, this has been addressed by generalised linear models or integer-valued ARMA models, while the development of models in state space framework has had mixed success. In this paper, we propose a general state space model that takes intermittence of data into account, extending the taxonomy of single source of error state space models. We show that this model has a connection with conventional non-intermittent state space models used in inventory planning. Certain forms of it may be estimated by Croston’s and Teunter-Syntetos-Babai (TSB) forecasting methods. We discuss properties of the proposed models and show how a selection can be made between them in the proposed framework. We then conduct a simulation experiment, empirically evaluating the inventory implications.

DOI: 10.1016/j.ijpe.2023.109013.

Working paper.

About the paper

DISCLAIMER: The models in this paper are also discussed in detail in the ADAM monograph (Chapter 13) with some examples going beyond what is discussed in the paper (e.g. models with trends).

What is “intermittent demand”? It is the demand that happens at irregular frequency (i.e. at random). Note that according to this definition, intermittent demand does not need to be count – it is a wider term than that. For example, electricity demand can be intermittent, but it is definitely not count. The definition above means that we do not necessarily know when specifically we will sell our product. From the modelling point of view, it means that we need to take into account two elements of uncertainty instead of just one:

How much people will buy;
When they will buy.

(1) is familiar for many demand planners and data scientists: we do not know specifically how much our customers will buy in the future, but we can get an estimate of the expected demand (mean value via a point forecast) and an idea of the uncertainty around it (e.g. produce prediction intervals or estimate the demand distribution). (2) is less obvious: there may be some periods when nobody buys our product, and then periods when we sell some, followed by no sales again. In that case we can encode the no sales in those “dry” periods with zeroes, the periods with demand as ones, and end up with a time series like this (this idea was briefly discussed in this and this posts):

An example of the occurrence part of an intermittent demand

The plot above visualises the demand occurrence, with zeroes corresponding to the situation of “no demand” and ones corresponding to some demand. In general, it is is challenging to predict, when the “ones” will happen specifically, but in the case above, it seems that over time the frequency of demand increases, implying that maybe it becomes regular. In mathematical terms, we could phrase this as the probability of occurrence increases over time: at the end of series, we won’t necessarily sell product, but the chance of selling is much higher than in the beginning. The original time series looks like this:

An example of an intermittent demand

It shows that indeed there is an increase of the frequency of sales together with the amount sold, and that it seems that the product is becoming more popular, moving from the intermittent to the regular demand domain.

In general, forecasting intermittent demand is a challenging task, but there are many existing approaches that can be used in this case. However, they are all detached from the conventional ones that are used for regular demand (such as ETS or ARIMA). What people usually do in practice is first categorise the data into regular and intermittent and then apply specific approaches to it (e.g. ETS/ARIMA for the regular demand, and Croston‘s method or TSB for the intermittent one).

John Boylan and I developed a statistical model that unites the two worlds – you no longer need to decide whether the data is intermittent or not, you can just use one model in an automated fashion – it will take care of intermittence (if there is one). It relies fundamentally on the classical Croston’s equation:
\begin{equation} \label{eq:general}
y_t = o_t z_t ,
\end{equation}
where \(y_t\) is the observed value at time \(t\), \(o_t\) is the binary occurrence variable and \(z_t\) is the demand sizes variable. Trying to derive the statistical model underlying Croston’s method, Snyder (2002) and Shenstone & Hyndman (2005) used models based on \eqref{eq:general} but instead of plugging in a multiplicative ETS in \(z_t\) they got stuck with the idea of logarithmic transformation of demand sizes and/or using count distributions for the demand sizes. John and I looked into this equation again and decided that we can model both demand sizes and demand occurrence using a pair of pure multiplicative ETS models. In this post, I will focus on ETS(M,N,N) as the simplest model, but more complicated ones (with trend and/or explanatory variables) can be used as well without the loss in logic. So, for the demand sizes we will have:
\begin{equation}
\begin{aligned}
& z_t = l_{t-1} (1 + \epsilon_t) \\
& l_t = l_{t-1} (1 + \alpha \epsilon_t)
\end{aligned}
\label{eq:demandSizes}
\end{equation}
where \(l_t\) is the level of series, \(\alpha\) is the smoothing parameter and \(1 + \epsilon_t \) is the error term that follows some positive distribution (the options we considered in the paper are the Log-Normal, Gamma and Inverse Gaussian). The demand sizes part is relatively straightforward: you just apply the conventional pure multiplicative ETS model with a positive distribution (which makes \(z_t\) always positive) and that’s it. However, the occurrence part is more complicated.

Given that the occurrence variable is random, we should model the probability of occurrence. We proposed to assume that \(o_t \sim \mathrm{Bernoulli}(p_t) \) (logical assumption, done in many other papers), meaning that the probability of occurrence changes over time. In turn, the changing probability can be modelled using one of the several approaches that we proposed. For example, it can be modelled via the so called “inverse odds ratio” model with ETS(M,N,N), formulated as:
\begin{equation}
\begin{aligned}
& p_t = \frac{1}{1 + \mu_{b,t}} \\
& \mu_{b,t} = l_{b,t-1} \\
& l_{b,t} = l_{b,t-1} (1 + \alpha_b \epsilon_{b,t})
\end{aligned}
\label{eq:demandOccurrenceOdds}
\end{equation}
where \(\mu_{b,t}\) is the one step ahead expectation of the underlying model, \(l_{b,t}\) is the latent level, \(\alpha_b\) is the smoothing parameter of the model, and \(1+\epsilon_{b,t}\) is the positively distributed error term (with expectation equal to one and an unknown distribution, which we actually do not care about). The main feature of the inverse odds ratio occurrence model is that it should be effective in cases when demand is building up (moving from the intermittent to the regular pattern, without zeroes). In our paper we show how such model can be estimated and also show that Croston’s method can be used for the estimation of this model when the demand occurrence does not change (substantially) between the non-zero demands. So, this model can be considered as the model underlying Croston’s method.

Uniting the equations \eqref{eq:general}, \eqref{eq:demandSizes} and \eqref{eq:demandOccurrenceOdds}, we get the iETS(M,N,N)\(_\mathrm{I}\)(M,N,N) model, where the letters in the first brackets correspond to the demand sizes part, the subscript “I” tells us that we have the “inverse odds ratio” model for the occurrence, and the second brackets show what ETS model was used in the demand occurrence model. The paper explains in detail how this model can be built and estimated.

In the very same paper we discuss other potential models for demand occurrence (more suitable for demand obsolescence or fixed probability of occurrence) and, in fact, in my opinion this part is the main contribution of the paper – we have looked into something no one did before: how to model demand occurrence using ETS. Having so many options, we might need to decide which to use in an automated fashion. Luckily, given that these models are formulated in one and the same framework, we can use information criteria to select the most suitable one for the data. Furthermore, when all probabilities of occurrence are equal to one, the model \eqref{eq:general} together with \eqref{eq:demandSizes} transforms into the conventional ETS(M,N,N) model. This also means that the regular ETS model can be compared with the iETS directly using information criteria to decide whether the occurrence part is needed or not. So, we end up with a relatively simple framework that can be used for any type of demand without a need to do a categorisation.

As a small side note, we also showed in the paper that the estimates of smoothing parameters for the demand sizes in iETS will always be positively biased (being higher than needed). In fact, this bias appears in any intermittent demand model that assumes that the potential demand sizes change between the non-zero observations (reasonable assumption for any modelling approach). In a way, this finding also applies to both Croston’s and TSB methods and agrees with similar finding by Kourentzes (2014).

Example in R

All the models from the paper are implemented in the adam() function from the smooth package in R (with the oes() function taking care of the occurrence, see details here and here). For the demonstration purposes (and for fun), we will consider an artificial example of the demand obsolescence, modelled via the “Direct probability” iETS model (it underlies the TSB method):

set.seed(7)
c(rpois(10,3),rpois(10,2),rpois(10,1),rpois(10,0.5),rpois(10,0.1)) |>
    ts(frequency=12) -> y

My randomly generated time series looks like this:

Demand becoming obsolete

In practice, in the example above, we can be interested in deciding, whether to discontinue the product (to save money on stocking it) or not. To model and forecast the demand above, we can use the following code in R:

library(smooth)
iETSModel <- adam(y, "YYN", occurrence="direct", h=5, holdout=TRUE)

The "YYN" above tells function to select the best pure multiplicative ETS model based on the information criterion (AICc by default, see discussion in Section 15.1 of the ADAM monograph), the "occurrence" variable specifies, which of the demand occurrence models to build. By default, the function will use the same model for the demand probability as the selected for the demand sizes. So, for example, if we end up with ETS(M,M,N) for demand sizes, the function will use ETS(M,M,N) for the probability of occurrence. If you want to change this, you would need to use the oes() function and specify the model there (see examples in Section 13.4 of the ADAM monograph). Finally, I've asked function to produce 5 steps ahead forecasts and to keep the last 5 observations in the holdout sample. I ended up having the following model:

summary(iETSModel)

Model estimated using adam() function: iETS(MMN)
Response variable: y
Occurrence model type: Direct
Distribution used in the estimation: 
Mixture of Bernoulli and Gamma
Loss function type: likelihood; Loss function value: 71.0549
Coefficients:
      Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha   0.1049     0.0925     0.0000      0.2903  
beta    0.1049     0.0139     0.0767      0.1049 *
level   4.3722     1.1801     1.9789      6.7381 *
trend   0.9517     0.0582     0.8336      1.0685 *

Error standard deviation: 1.0548
Sample size: 45
Number of estimated parameters: 9
Number of degrees of freedom: 36
Information criteria:
     AIC     AICc      BIC     BICc 
202.6527 204.1911 218.9126 206.6142

As we see from the output above, the function has selected the iETS(M,M,N) model for the data. The line "Mixture of Bernoulli and Gamma" tells us that the Bernoulli distribution was used for the demand occurrence (this is the only option), while the Gamma distribution was used for the demand sizes (this is the default option, but you can change this via the distribution parameter). We can then produce forecasts from this model:

forecast(iETSModel, h=5, interval="prediction", side="upper") |>
    plot()

In the code above, I have asked the function to generate prediction intervals (by default, for the pure multiplicative models, the function uses simulations) and to produce only the upper bound of the interval. The latter is motivated by the idea that in the case of the intermittent demand, the lower bound is typically not useful for decision making: we know that the demand cannot be below zero, and our stocking decisions are typically made based on the specific quantiles (e.g. for the 95% confidence level). Here is the plot that I get after running the code above:

Point and interval forecasts for the demand becoming obsolete

While the last observation in the holdout was not included in the prediction interval, the dynamics captured by the model is correct. The question that we should ask ourselves in this example is: what decision can be made based on the model? If you want to decide whether to stock the product or not, you can look at the forecast of the probability of occurrence to see how it changes over time and decide, whether to discontinue the product:

forecast(iETSModel$occurrence, h=5) |> plot()

Forecast of the probability of occurrence for the demand becoming obsolete

In our case, the probability reaches roughly 0.2 over the next 5 months (i.e. we might sale once every 5 months). If we think that this is too low then we should discontinue the product. Otherwise, if we decide to continue selling the product, then it makes more sense to generate the desired quantile of the cumulative demand over the lead time. In case of the adam() function it can be done by adding cumulative=TRUE in the forecast() function:

forecast(iETSModel, h=5, interval="prediction", side="upper", cumulative=TRUE)

after which we get:

      Point forecast Upper bound (95%)
Oct 4      0.3055742          1.208207

From the decision point of view, if we deal with count demand, the value 1.208207 complicates things. Luckily, as we showed in our paper, we can round the value up to get something meaningful, preserving the properties of the model. This means, that based on the estimated model, we need to have two items in stock to satisfy the demand over the next 5 months with the confidence level of 95%.

Conclusions

This is just a demonstration of what can be done with the proposed iETS model, but there are many more things one can do. For example, this approach allows capturing multiplicative seasonality in data that has zeroes (as long as seasonal indices can be estimated somehow). John and I started thinking in this direction, and we even did some work together with Patricia Ramos (our colleague from the university of INESC TEC), but given the hard time that was given to our paper by the reviewers in IJF, we had to postpone this research. I also used the ideas explained in this post in the paper on ED forecasting (written together with Bahman and Jethro). In that paper, I have used a seasonal model with the "direct" occurrence part, which tool care of zeroes (not bothering with modelling them properly) and allowed me to apply a multiple seasonal multiplicative ETS model with explanatory variables. Anyway, the proposed approach is flexible enough to be used in variety of contexts, and I think it will have many applications in real life.

P.S.: Story of the paper

I've written a separate long post, explaining the revision process of the paper and how it got to the acceptance stage at the IJPE, but then I realised that it is too long and boring. Besides, John would not have approved of the post and would say that I am sharing the unnecessary details, creating potential exasperation for fellow forecasters who reviewed the paper. So, I have decided not to publish that post, and instead just to add a short subsection. Here it is.

We started working on the paper in March 2016 and submitted it to the International Journal of Forecasting (IJF) in January 2017. It went through four rounds of revision with the second reviewer throughout the way being very critical, unsupportive and driving the paper into a wrong direction, burying it in the discussion of petty statistical details. We rewrote the paper several times and I rewrote the R code of the function few times. In the end the Associate Editor (AE) of the IJF (who completely forgot about our paper for several months) decided not to send the paper to the reviewers again, completely ignored our responses to the reviewers, did not provide any major feedback and have written an insulting response that ended with the phrase "I could go on, but I’m out of patience with the authors and their paper". The paper was rejected from IJF in 2019, which set me back in my academic career. This together with constant rejections of my Complex Exponential Smoothing paper and actions of a colleague of mine who decided to cut all ties with me in Summer 2019, hit my self-esteem and caused a serious damage to my professional life. I thought of quitting academia and to either starting working in business or doing something different with my life, not related to forecasting at all. I stayed mainly because of all the support that John Boylan, Robert Fildes, Nikos Kourentzes and my wife Anna Sroginis provided me. I recovered from that hit only in 2022, when my Complex Exponential Smoothing paper got accepted and things finally started turning well. After that John and I have rewritten the paper again, split it into two: "iETS" and "Multiplicative ETS" (under revision in IMA Journal of Management Mathematics) and submitted the former to the International Journal of Production Economics, where after one round of revision it got accepted. Unfortunately, we never got to celebrate the success with John because he passed away.

The moral of this story is that publishing in academia can be very tough and unfair. Sometimes, you get a very negative feedback from the people you least expect to get it from. People that you respect and think very highly of might not understand what you are proposing and be very unsupportive. We actually knew who the reviewers and the AE of our IJF paper were - they are esteemed academics in the field of forecasting. And while I still think highly of their research and contributions to the field, the way the second reviewer and the AE handled the review has damaged my personal respect to them - I never expected them to be so narrow-minded...

Message iETS: State space model for intermittent demand forecasting first appeared on Open Forecasting.