Archives ETS - Open Forecast

smooth in python: Non-normal distributions in ETS/ARIMA

Ivan Svetunkov — Wed, 27 May 2026 14:17:46 +0000

So, you know quite well that the normal distribution is one of the most popular distributions in statistics. The reasons are manifold, including convenience for the academic community and the fact that it is taught in every single statistics course in the world. But what if we don’t want to be normal?

There are situations where non-normal distributions fit considerably better. The main candidate for substitution is the conditional distribution of the response variable. For example, sales of engines cannot follow the normal distribution by definition: they are intermittent and integer-based — you cannot sell 1.78 engines. More generally, while demand can be fractional, it cannot be negative. It is therefore only logical to use distributions that support positive values only in these situation. Examples include Log-Normal, Gamma, and Inverse Gaussian, among many others.

In my last paper with John Boylan (this one), we discussed how ETS can be extended to use these three distributions instead of the normal one. I implemented this functionality (together with the others, such as Laplace and Generalised Normal) in ADAM. It supports any of these distributions with any ETS/ARIMA/regression model, for both additive and multiplicative error terms. There is some maths involved, which you can find here and here.

Why bother? The main is in the predictive distribution. If the data is not normal, we may end up with poorly calibrated forecasts and misleading prediction intervals. Using a more appropriate distribution can resolve this.

But how do we choose the right distribution for our data?

A possible solution (similar to selecting ETS components) is to fit models with different distributions and pick the one with the lowest information criterion. This is implemented in the ADAM function from the smooth package. We can do this manually, or use AutoADAM (called auto.adam in R) to select the most suitable distribution based on AICc automatically:

from fcompdata import AirPassengers
from smooth import AutoADAM

model = AutoADAM(lags=[12], h=12, holdout=True, orders=None, verbose=True)
model.fit(AirPassengers.y)
model.summary()

The orders=None line stops the function from trying different ARIMA orders – something we will come back to in a future post. For this example, the output is:

Model estimated using ADAM() function: ETS(MAM)
Response variable: y
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 523.2756
Coefficients:
       Estimate  Std. Error  Lower 2.5%  Upper 97.5%   
alpha    0.7575      0.0895      0.5807       0.9343  *
beta     0.0000      0.0080      0.0000       0.0158   
gamma    0.0000      0.0503      0.0000       0.0994   
Error standard deviation: 0.0358
Sample size: 144
Number of estimated parameters: 4
Number of degrees of freedom: 140
Information criteria:
      AIC     AICc       BIC      BICc
1054.5512 1054.839 1066.4305 1067.1455

Boring… the function found that the Normal distribution has the lowest AICc among those tested – the Air Passengers data is too well-behaved.

Oh, and don’t forget to produce the forecasts:

model.predict(h=18, interval="prediction")

Smooth forecasting!

Install smooth: pip install smooth

Message smooth in python: Non-normal distributions in ETS/ARIMA first appeared on Open Forecast.

smooth in python: multiple seasonal ETS

Ivan Svetunkov — Mon, 11 May 2026 08:08:38 +0000

Another interesting case in demand forecasting is the high frequency data. For example, if you work with demand on daily level, you might notice that demand increases every Monday but also exhibits proper seasonal fluctuations (e.g. decline every Winter). What do you do in this case?

One of the solutions (old but gold) is the multiple seasonal ETS model, which was originally developed by James Taylor (2003) for the pure additive exponential smoothing. The idea was quite simple: to model multiple seasonal cycles, one can add multiple seasonal components, i.e. to capture the day-of-week (frequency 7) and the day-of-year (frequency 365) effects. While it worked fine for some examples, the main issue with it has been its computational speed (or rather slowness): the original ETS needs to estimate all smoothing parameters + all the initial values for seasonal indices and other components. Both ADAM and ES in the smooth package support multiple seasonalities and avoid the whole issue by using a different model initialisation called “backcasting”.

Here is a classical example from James’ paper on the half-hourly electricity demand (see the image in the post). It is clear that there is a half-hour-of-day and the day-of-week effects. In ES, this means that we need to provide the vector for the lags variable:

from smooth import ES
from fcompdata import taylor

# Fit ES with automatic ETS model selection
model = ES(lags=[48, 336], h=336, holdout=True)
model.fit(taylor.y)
model.predict(h=336)
print(model)

This is the output I get from the function:

Time elapsed: 2.03 seconds
Model estimated using ES() function: ETS(MNM)
With backcasting initialisation
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 25391.1773
Persistence vector g:
 alpha gamma1 gamma2
0.2899 0.1283 0.5270
Sample size: 3696
Number of estimated parameters: 4
Number of degrees of freedom: 3692
Information criteria:
      AIC      AICc       BIC      BICc
50790.3546 50790.3654 50815.2146 50815.2591

Forecast errors:
ME: 829.1195; MAE: 942.1447; RMSE: 1065.1127
sCE: 941.5012%; Asymmetry: 9.2%; sMAE: 3.1841%; sMSE: 0.1296%
MASE: 1.4491; RMSSE: 1.1286; rMAE: 0.1408; rRMSE: 0.1300

The computational time on this data was only 2.03 second. In this time, the function tried several possible ETS models and selected the best one based on the AICc value. The resulting best model is ETS(M,N,M), which makes perfect sense for this data.

Is there a way to improve this model? Yes! Taylor mentions that adding AR(1) to the cocktail tends to improve the accuracy in case of multiple seasonal series. We can try that if we switch to ADAM:

from smooth import ADAM

# Fit ADAM ETS(MNM)+AR(1) model
model = ADAM(model="MNM", ar_orders=1, lags=[48, 336], h=336, holdout=True)
model.fit(taylor.y)
print(model)
model.plot(7)

Here is the output:

Time elapsed: 1.04 seconds
Model estimated using ADAM() function: ETS(MNM)+ARIMA(1,0,0)
With backcasting initialisation
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 24157.2473
Persistence vector g:
 alpha gamma1 gamma2
0.1097 0.2225 0.3481
ARMA parameters of the model:
             Lag 1
AR(1)       0.6852
Sample size: 3696
Number of estimated parameters: 5
Number of degrees of freedom: 3691
Information criteria:
      AIC      AICc       BIC      BICc
48324.4947 48324.5109 48355.5697 48355.6365

Forecast errors:
ME: 276.4061; MAE: 462.5092; RMSE: 588.5957
sCE: 313.8711%; Asymmetry: 2.1%; sMAE: 1.5631%; sMSE: 0.0396%
MASE: 0.7114; RMSSE: 0.6237; rMAE: 0.0691; rRMSE: 0.0719

The resulting model has lower AICc, but also produces more accurate point forecasts (compare RMSSE values) for the holdout set. The following image shows the data and the point forecasts for it:

Double seasonal ETS(M,N,M) applied to the half-hourly electricity demand data

What else can we do here? Actually, quite a lot: multistep losses, seasonal ARIMA, explanatory variables – things can get only more complicated from here. Have a look at this.

Do I hear someone shouting “TBATS”? TBATS is the exponential smoothing with additional bells and whistles (ETS + adapted Fourier terms + ARMA errors). I don’t have it as a separate function in the smooth just yet, but you can reproduce it, for example, like this.

So, what are you waiting for? Dive in and see how it works for yourself!

Install smooth: pip install smooth

Message smooth in python: multiple seasonal ETS first appeared on Open Forecast.

smooth in python: ETS with explanatory variables

Ivan Svetunkov — Tue, 05 May 2026 08:03:37 +0000

We continue our series of posts on the functions from the smooth package for Python/R. Today we will see how to enhance your exponential smoothing with explanatory variables. What? Yes, you heard me! Let’s dive in!

We all know that in real life sales don’t just evolve over time on their own. Any univariate model, such as ARIMA or ETS is just a way to approximate a complex reality. In practice, there are many factors affecting the demand for your product. What would happen if the price on your product increases? What if you run a promotion (e.g. “Buy One, Get One Free”)? Your competitor’s strategy impacts the demand for your product as well… There’s lots of different factors, and some of them can be quite useful in demand forecasting. But can we join the dynamic univariate models with regression?

Yes, we can! Although ETS is thought as a pure univariate model, it is easy to extend to include explanatory variables. There are several great papers showing how it works (e.g. Kourentzes & Petropoulos, 2016), and in fact the es() function from the smooth package for R was used as a benchmark in the M5 competition.

So, consider a situation where you have weekly sales of a product with some recorded promotions (encoded as dummy variables). We will use a time series from the fcompdata package for Python. The first image shows how the series looks, the vertical lines show when promotions happen. The series itself seems to be seasonal, roughly repeating peaks and troughs every 52 observations (every year). Also, we see that there are two types of promotions, and when they happen sales tend to increase. So, including them should improve the model fit, and if the company decides to run promotions again, the model will forecast demand better. I will start by fitting the ETS(M,N,M) to the data:

from smooth import ES
from fcompdata import PromoData

y = PromoData.y

model = ES(model="MNM", lags=52, holdout=True, h=13)
model.fit(y)
model.predict(h=13)
model.plot(7)

NOTE: PromoData has a specific structure with several attributes. PromoData.x contains the in-sample data, PromoData.xx has the holdout – this is consistent with the Mcomp package for R. The new features in python are:

PromoData.y – concatenated training and test sets,
PromoData.xregx – matrix of explanatory variables for the training set,
PromoData.xregxx – matrix of explanatory variables for the test set,
PromoData.xreg – the full (concatenated) matrix of explanatory variables.

The following image shows the model fit and the point forecasts from the ETS(M,N,M):

ETS(M,N,M) fit and forecast for the promotional data example

As expected, because the model does not take promotions into account, it fits the data as best as it can and produces forecasts that are oblivious of the potential external effects on sales. We can improve it by including the promotional dummies:

X_train = PromoData.xreg
X_test =  PromoData.xregxx

model = ES(model="MNM", lags=52, holdout=True, h=13)
model.fit(y, X_train)
model.predict(h=13, X=X_test)
model.plot(7)

ETS(M,N,M) with explanatory variables

The image above shows the fit and the point forecasts from the ETSX(M,N,M) model that now takes the promotions into account. This is quite an improvement in comparison with the previous one. Furthermore, if we can control when to have promotions and what types of promotions to run, we can change the values in the `X_test` matrix and see what demand to expected in that situation. So, this gives an analyst a tool for a more advanced sensitivity analysis.

Read more about the ETSX here.
Install smooth: pip install smooth
ETSX wiki on github.

Message smooth in python: ETS with explanatory variables first appeared on Open Forecast.

smooth in python: ETS forecast combination

Ivan Svetunkov — Mon, 27 Apr 2026 08:01:30 +0000

Last time we saw how to do automated model selection using the ES function from the smooth package. Now I want to show how to produce combined forecasts from ETS.

Why bother?

There is a vast body of literature on forecast combinations (read this great review). The main idea is that you should not put all your eggs in one basket — the safer strategy is to combine forecasts from different models instead of selecting just one. Yes, it is more computationally expensive, but the trade-off is higher accuracy on average.

For ETS, a great solution was proposed by Stephan Kolassa in his 2011 paper: extract AIC values, calculate AIC weights (giving the highest weight to the best-performing model and lower ones to the rest), then combine the forecasts. The resulting forecasts tend to be more robust, because in practice it might be hard to tell the difference between, for example, ETS(M,A,M) and ETS(M,Md,M). So why choose one when you can have all? I implemented this mechanism in the smooth package for R years ago, and now it is also available in Python.

Here is how it works on an example using an M3 time series. I picked this specific one because it is seasonal, but the trend is not very well pronounced. The series is shown in the first image.

from smooth import ES
from fcompdata import M3

series = M3[1687]
y = series.y
freq = series.period

# Fit ETS models, combine forecasts
model = ES(model="CXC", lags=series.period, h=18, holdout=True)
model.fit(y)
model.predict(h=18)

The code above tells ES to fit all ETS models with additive and no trend (“X” in the middle), calculate AIC weights, produce forecasts from each one of them, and then combine them. The resulting point forecast is the weighted combination of the individual forecasts. If a prediction interval is required, the specific quantiles are combined directly (see the paper by Lichtendahl et al., 2013). This is inevitably slower than the default model selection mechanism, but is a safer approach. The point forecast and the prediction interval (grey lines) are shown in the attached image.

Note that the user can regulate the pool of combined models via the “model” parameter of the function. This wiki explains all the accepted options.

So why not go ahead and try it yourself, and see how it works for your data?

🔗 Install smooth: pip install smooth
📖 More on forecasts combination in ADAM.

Message smooth in python: ETS forecast combination first appeared on Open Forecast.

smooth in python: ETS with model selection

Ivan Svetunkov — Wed, 22 Apr 2026 00:06:43 +0000

As some of you have heard, the smooth package is now on PyPI. So, I’ve decided to write a series of posts showcasing how some of its functions work. We start with the basics, ETS.

ETS stands for the “Error-Trend-Seasonal” model or ExponenTial Smoothing. It is a statistical model that relies on time series decomposition and updates the unobserved states (level/trend/seasonal) based on the mistakes it makes. In a way, you can call it an adaptive model that changes its forecast based on the most recent available information. It is relatively simple to explain and work with, and it has performed well in a variety of competitions (M3, M4, M5, for example).

The smooth package implements an advanced form of ETS in the ADAM and a more basic one in the ES classes. In fact, ES is just a wrapper of ADAM, it is the conventional model, with just some tuning. Both support all 30 ETS models, have automated model selection and forecast combination, allow producing point forecasts and a variety of prediction intervals types. In fact, if you want a straightforward robust implementation of ETS, give ES a try.

Here’s how to use it in Python:

from smooth import ES
from fcompdata import M3

# Pick a series from the M3 competition for demonstration
series = M3[2568]
y = series.x
freq = series.period

# Fit ES with the automatic model selection
model = ES(lags=freq, h=18, holdout=True)
model.fit(y)
print(model)

Running this produces output similar to this:

Time elapsed: 0.4 seconds
Model estimated using ES() function: ETS(MAM)
With backcasting initialisation
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 724.8524
Persistence vector g:
 alpha   beta  gamma
0.0065 0.0000 0.0000
Sample size: 98
Number of estimated parameters: 4
Number of degrees of freedom: 94
Information criteria:
      AIC      AICc       BIC      BICc
1457.7047 1458.1348 1468.0446 1469.0306

Forecast errors:
ME: -580.9985; MAE: 604.0204; RMSE: 710.5457
sCE: -149.9347%; Asymmetry: -2.5%; sMAE: 8.6598%; sMSE: 1.0378%
MASE: 0.2653; RMSSE: 0.2452; rMAE: 0.2555; rRMSE: 0.2163

A few things worth noting from the output:

ES automatically selected ETS(MAM) based on the AICc value – a multiplicative error, additive trend, multiplicative seasonality model – as the best fit
It used backcasting for the model initialisation (default), which speeds up the process and requires fewer parameters to estimate
It kept the last 18 observation for the holdout, produced autoforecasts for it and calculated several forecast errors. This is handy if you want to directly compare different smooth models on a time series.

But why are we here? We want to forecast! So, here it is:

model.predict(h=18, interval="prediction")
model.plot(7)

This should produce an image similar to the one attached to the post. As simple as that.

Now it’s your turn! :)

🔗 Install smooth: pip install smooth
📖 smooth wiki

Message smooth in python: ETS with model selection first appeared on Open Forecast.

smooth forecasting with the smooth package in Python

Ivan Svetunkov — Thu, 09 Apr 2026 09:15:02 +0000

Here is another piece of news I have been hoping to deliver for quite some time now (since January 2026 actually). We have finally created the first release of the smooth package for Python and it is available on PyPI! Anyone interested? Read more!

On this page:

Why does “smooth” exist?
A bit of history
How to install
What works
An example
Evaluation

Setup
Jupiter notebook
sktime non-collaborative stance
Results

What’s next?
Summary

Why does “smooth” exist?

There are lots of implementations of ETS and ARIMA (dynamic models) out there, both in Python and in R (and also in Julia now, see Durbyn). So, why bother creating yet another one?

The main philosophy of the smooth package in R is flexibility. It is not just an implementation of ETS or ARIMA – it is there to give you more control over what you can do with these models in different situations. The main function in the package is called “ADAM” – the Augmented Dynamic Adaptive Model. It is the single source of error state space model that unites ETS, ARIMA, and regression, and supports the following list of features (taken from the Introduction in the book):

ETS;
ARIMA;
Regression;
TVP regression;
Combination of (1), (2), and either (3) or (4). i.e. ARIMAX/ETSX;
Automatic selection/combination of states for ETS;
Automatic orders selection for ARIMA;
Variables selection for regression;
Normal and non-normal distributions;
Automatic selection of most suitable distributions;
Multiple seasonality;
Occurrence part of the model to handle zeroes in data (intermittent demand);
Modelling scale of distribution (GARCH and beyond);
A variety of ways to produce forecasts for different situation;
Advanced loss functions for model estimation;
…

All of these features come with the ability to fine tune the optimiser (e.g. how the parameters are estimated) and to manually adjust any model parameters you want. This allows, for example, fitting iETSX(M,N,M) with multiple frequencies with Gamma distribution to better model hourly emergency department arrivals (intermittent demand) and thus producing more accurate forecasts of it. There is no other package either in R or Python that could give such flexibility with the dynamic models to users.

And at the same time, over the years, I have managed to iron out the R functions so much that they handle almost any real life situation and do not break. And at the same time, they work quite fast and produce accurate forecasts, sometimes outperforming the other existing R implementations.

So, here we are. We want to bring this flexibility, robustness, and speed to Python.

A bit of history

The smooth package for R was first released on CRAN in 2016, when I finished my PhD. It went from v1.4.3 to v4.4.0 over the last 10 years. It saw a rise in popularity, but then an inevitable decline due to the decreasing number of R users in business. So, back in 2021, Rebecca Killick and I applied for an EPSRC grant to develop Python packages for forecasting and time series analysis. The idea was to translate what we have in R (including greybox, smooth, forecast, and changepoint detection packages) to Python with the help of professional programmers. Unfortunately, we did not receive the funding (it went to sktime for a good reason – they already had an existing codebase in Python).

In the beginning of 2023, Leonidas Tsaprounis got in touch with me, suggesting some help with the development and translation of the smooth package to Python. The idea was to use the existing C++ core and simply create a wrapper in Python. “Simply” is actually an oversimplification here, because I am not a programmer, so my functions are messy and hard to read. Nonetheless, we started cooking. Leo helped in setting up pybind11 and carma, and creating the necessary files for the compilation of the C++ code. Just to test whether that worked, we managed to create a basic function for the simple moving average based on the sma() from the R version. Our progress was a bit slow, because we were both busy with other projects. All of that changed when in July 2023 Filotas Theodosiou joined our small team and started working on the translation. We decided to implement the hardest thing first – ADAM.

What Fil did was use LLMs to translate the code from R to Python. Fil will write about this work at some point; the only thing I can say here is that it was not an easy process, because thousands of lines of R code needed to be translated to Python and then refactored. I only helped with suggestions and explanations of what is happening inside, and Leo provided guidance regarding the coding philosophy. It was Fil and his AI tools that did the main heavy lifting. By Summer 2025, we had a basic working ADAM() function, but it worked slightly differently from the one in R due to differences in initialisation and optimisation. He presented his work at the IIF Open Source Forecasting Software Workshop, explaining his experience with LLMs and code translation. Because everyone was pretty busy, it took a bit more time to reach the first proper release of smooth in Python.

In December 2025, I bought a Claude AI subscription and started vibe coding my way through the existing Python code. Between the three of us, we managed to progress the project, and finally in January 2026 we reached v1.0.0 of smooth in Python. Now ADAM works in exactly the same way in both R and Python: if you give it the same time series, it will select the same model and produce the same parameter estimates across languages. It took us time and effort to reach this, but we feel it is a critically important step – ensuring that users working in different languages have the same experience.

How to install

We are entirely grateful to Gustavo Niemeyer, who gave us the name smooth on PyPI. It belonged to him since 2020, but the project was abandoned, and he agreed to transfer it to us. So, now you can install smooth simply by running:

pip install smooth

There is also a development version of the package, which you can install by following the instructions in our Installation wiki on GitHub.

What works

The package currently does not have the full functionality, but there are already some things:

ADAM() – the main function that supports ETS with:
- components selection via model="ZXZ" or other pools (see wiki on GitHub for details);
- forecast combination using AIC weights via model="CCC" or other pools (again, explained in the wiki);
- multiple seasonalities via lags=[24,168];
- ability to provide some smoothing parameters or initial values (e.g. only alpha), letting the function estimate the rest;
- different distributions;
- advanced loss functions;
- several options for model initialisation;
- fine-tuning of the optimiser via nlopt_kargs (read more in the wiki);
ES() – the wrapper of ADAM() with the normal distribution. Supports the same functionality, but is a simplification of ADAM.

We also have the standard methods for fitting and forecasting, and many attributes that allow extracting information from the model, all of which are explained in the wiki of the project.

An example

Here is an example of how to work with ADAM in Python. For this example to work, you will need to install the fcompdata package from pip:
pip install fcompdata

The example:

from smooth import ADAM
from fcompdata import M3

model = ADAM(model="ZXZ", lags=12)
model.fit(M3[2568].x)

print(model)

This is what you should see as the result of print:

Time elapsed: 0.21 seconds
Model estimated using ADAM() function: ETS(MAM)
With backcasting initialisation
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 868.7085
Persistence vector g:
 alpha   beta  gamma
0.0205 0.0203 0.1568
Damping parameter: 1.0000
Sample size: 116
Number of estimated parameters: 4
Number of degrees of freedom: 112
Information criteria:
      AIC      AICc       BIC      BICc
1745.4170 1745.7774 1756.4314 1757.2879

which is exactly the same output as in R (see, for example, some explanations here). We can then produce a forecast from this model:

predict(model, h=18, interval="prediction", level=[0.9,0.95])

The predict method currently supports analytical (aka “parametric”/”approximate”) and simulated prediction intervals. The interval="prediction" will tell the function to choose between the two depending on the type of model (multiplicative ETS models do not have analytical formulae for the multistep conditional variance and, as a result, do not have proper analytical prediction intervals). The level parameter can accept either a vector (which will produce several quantiles) or a scalar. What I get after running this is:

             mean    lower_0.05   lower_0.025    upper_0.95   upper_0.975
116  11234.592643  10061.150811   9853.119370  12443.904763  12686.143870
117   8050.810544   7228.709864   7064.356429   8896.078713   9080.867866
118   7658.608163   6886.165498   6746.881596   8475.545469   8633.149796
119  10552.382933   9452.306261   9236.493206  11679.816814  11892.381046
120  10889.816551   9768.665327   9559.580233  12066.628218  12313.872205
121   7409.545388   6643.378080   6495.237349   8232.558913   8363.550598
122   7591.183726   6800.878319   6650.514904   8425.167556   8576.257297
123  14648.452226  13089.346824  12793.226771  16263.206997  16582.786939
124   6953.045603   6206.829418   6079.126523   7730.260301   7892.917098
125  11938.941882  10650.759513  10427.172989  13307.563498  13579.662925
126   8299.626845   7379.550080   7200.075336   9280.095498   9468.327654
127   8508.558884   7530.987557   7367.541698   9534.117611   9734.218257
128  11552.654541  10162.615284   9907.770755  13039.710057  13313.534412
129   8286.727505   7273.279560   7100.450038   9401.374461   9627.922116
130   7889.999721   6879.771040   6710.494741   8948.052817   9186.279622
131  10860.671447   9438.536335   9174.365147  12353.444279  12664.983138
132  11218.330395   9690.780430   9453.114931  12849.545829  13201.679610
133   7620.922782   6564.497975   6391.080974   8748.561119   8977.120673

The separate wiki on Fitted Values and Forecasts explains all the parameters accepted by the predict method and what is returned by it.

Evaluation

To see how the developed function works, I decided to conduct exactly the same evaluation that I did for the recent R release of the smooth package, running the functions on the M1, M3, and Tourism competition data (5,315 time series) using the fcompdata package.

Setup

I have selected the same set of models for Python as I did in R. Here are several options for the ADAM model parameter to see how the specific pools impact accuracy (this is discussed in detail in Section 15.1 of ADAM):

XXX – select between pure additive ETS models only;
ZZZ – select from the pool of all 30 models, but use branch-and-bound to remove the less suitable models;
ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
FFF – select from the pool of all 30 models (exhaustive search);
SXS – the pool of models used by default in ets() from the forecast package in R.

I also tested three types of ETS initialisation (read more about them here):

Back – initial="backcasting" – this is the default initialisation method;
Opt – initial="optimal";
Two – initial="two-stage".

I have also found the following implementations of ETS in Python and included them in my evaluation:

There is also a darts implementation of AutoETS, which is actually a wrapper of the statsforecast one. So I ran it just to check how it works, and found that it failed in 1,518 cases. I filed the issue, and it turned out that their implementation does not deal with short time series (10 observations or fewer), which is their design decision. They are now considering what to do about that, if anything.

I used RMSSE (M5 competition, motivated by Athanasopoulos & Kourentzes (2023)) and SAME error measure together with the computational time for each time series:
\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.

\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).

All of this was implemented in a Jupyter notebook, which is available here in case you want to reproduce the results.

sktime non-collaborative stance

In the first run of this (on 28th January 2026), I encountered several errors in AutoETS in sktime: it took an extremely long time to compute (see the table below – on average around 30 seconds per time series) and produced ridiculous forecasts (mean RMSSE was 106,951). I filed an issue in their repo and sent a courtesy message to Franz Kiraly on LinkedIn the same day, saying that I would be happy to rerun the results if this was fixed. I then received an insulting email from him, blaming me for not collaborating and trying to diminish sktime. They then closed the issue, claiming that it was fixed. I reran the experiment with their development version from GitHub on 21st March (two months later!), only to get exactly the same results. I do not think their fix is working, but given Franz’s toxic behaviour, I am not going to rerun this any further or help him in any way. The Jupyter notebook with the experiment is here, so he can investigate on his own.

Results

So, here are the summary results for the tested models:

====================================================================================
EVALUATION RESULTS for RMSSE (All Series)
====================================================================================
               Method       Min        Q1       Med        Q3       Max      Mean
        ADAM ETS Back  0.018252  0.663358  1.161473  2.301861  50.25854  1.928086
         ADAM ETS Opt  0.024155  0.670682  1.185932  2.365498  51.61599  1.943729
         ADAM ETS Two  0.024599  0.669522  1.182516  2.342385  51.61599  1.947715
              ES Back  0.018252  0.667225  1.160971  2.313932  50.25854  1.927436
               ES Opt  0.024155  0.673575  1.185756  2.364915  51.61599  1.947180
               ES Two  0.024467  0.671771  1.187368  2.346343  51.61599  1.955076
               ES XXX  0.018252  0.677717  1.170823  2.306197  50.25854  1.961318
               ES ZZZ  0.011386  0.670211  1.179916  2.353334  115.5442  2.053459
               ES FFF  0.011386  0.680956  1.211736  2.449626  115.5442  2.100899
               ES SXS  0.018252  0.674537  1.169187  2.353334  50.25854  1.939847
statsforecast AutoETS  0.024468  0.673157  1.189209  2.326650  51.61597  1.923925
   skforecast AutoETS  0.074744  0.747200  1.344916  2.721083  50.54339  2.273724
       sktime AutoETS  0.024467  0.676191  1.190093  2.456184 565753200  106951.7

Things to note:

The best performing ETS on average is from Nixtla’s statsforecast package. The second best is our implementation (ADAM/ES) with backcasting;
Given that I ran exactly the same experiment for the R packages, we can conclude that Nixtla’s implementation is even better than the one in the forecast package in R;
In terms of median RMSSE, ES with backcasting outperforms all other implementations;
ADAM ETS is the best in terms of the first and third quartiles of RMSSE;
ADAM ETS and ES perform quite similarly. This is expected, because ES is a wrapper of ADAM ETS, which assumes normality for the error term. ADAM ETS switches between Normal and Gamma distributions based on the type of error term;
Backcasting leads to the most accurate forecasts on these datasets. This does not mean it is a universal rule, and I am sure the situation will change for other datasets;
ES XXX gives exactly the same results as the one implemented in the R version of the package. This is important because we were aiming to reproduce results between R and Python with 100% precision, and we did. The reason why other ETS flavours differ between R and Python is that since smooth 4.4.0 for R, we changed how point forecasts are calculated for multiplicative component models: previously, they relied on simulations; now we simply use point forecasts. While this is not entirely statistically accurate, it is pragmatic because it avoids explosive trajectories.

The results for SAME are qualitatively similar to those for RMSSE:

===================================================================================
EVALUATION RESULTS for SAME (All Series)
===================================================================================
               Method       Min        Q1       Med        Q3       Max      Mean
        ADAM ETS Back  0.001142  0.374466  0.995272  2.402342  52.34177  1.951070
         ADAM ETS Opt  0.000106  0.373551  1.021661  2.485596  55.10179  1.962040
         ADAM ETS Two  0.000782  0.380398  1.029629  2.451008  55.10186  1.970422
              ES Back  0.001142  0.372777  0.994547  2.412503  53.45041  1.946517
               ES Opt  0.000217  0.372725  1.024666  2.478435  54.68603  1.967217
               ES Two  0.000095  0.384795  1.028561  2.454543  54.68558  1.982261
               ES XXX  0.000094  0.373315  1.005006  2.425682  53.16973  1.992656
               ES ZZZ  0.000760  0.386673  1.017732  2.467912  145.7604  2.107522
               ES FFF  0.000597  0.401426  1.048395  2.566151  145.7604  2.173559
               ES SXS  0.000597  0.375438  1.005603  2.450490  53.45041  1.964322
statsforecast AutoETS  0.000228  0.374821  1.015205  2.434952  53.61359  1.938682
   skforecast AutoETS  0.000993  0.457066  1.217751  2.954003  59.84443  2.409900
       sktime AutoETS  0.000433  0.392802  1.029433  2.571555 385286900  72931.90

Finally, I also measured the computational time and got the following summary. Note that I moved ES flavours below because they are not directly comparable with the others (they have special pools of models):

================================================================================
EVALUATION RESULTS for Computational Time in seconds (All Series)
================================================================================
               Method       Min        Q1       Med        Q3        Max      Mean
        ADAM ETS Back  0.008679  0.086219  0.140929  0.241768   0.923601  0.181546
         ADAM ETS Opt  0.051302  0.229400  0.315379  0.792827   2.638193  0.550378
         ADAM ETS Two  0.051314  0.287382  0.455149  1.080276   3.535120  0.715090
              ES Back  0.009274  0.085299  0.139511  0.247114   0.958868  0.182547
               ES Opt  0.053176  0.224293  0.312200  0.772161   2.888662  0.541243
               ES Two  0.048539  0.279598  0.446404  1.058847   3.553156  0.703183
statsforecast AutoETS  0.001770  0.007702  0.081189  0.167040   1.271202  0.102575
   skforecast AutoETS  0.021553  0.243102  0.302078  1.478667   7.482101  0.820576
       sktime AutoETS  0.128021  6.227344 19.170921 41.494513 229.793067 30.712191

================================================================================
               ES XXX  0.008793  0.054139  0.093792  0.144012   0.480976  0.104800
               ES ZZZ  0.010502  0.127297  0.184561  0.447649   1.794031  0.313157
               ES FFF  0.046921  0.215119  1.110657  1.557724   3.489375  1.071004
               ES SXS  0.024909  0.116427  0.434594  0.564973   1.251257  0.403988

Things to note:

Nixtla’s implementation is actually the fastest and very hard to beat. As far as I understand, they did great work implementing some code in C++ and then using numba (I do not know what that means yet);
Our implementation does not use numba, but our ETS with backcasting is still faster than the skforecast and sktime implementations;
ADAM ETS with backcasting also has the lowest maximum time, implying that in difficult situations it finds a solution relatively quickly compared with the others;

So, overall, I would argue that the smooth implementation of ETS is competitive with other implementations. But it has one important benefit: it supports more features. And we plan to expand it further to make it even more useful across a wider variety of cases.

What’s next

There are still a lot of features that we have not managed to implement yet. Here is a non-exhaustive list:

Explanatory variables to have ETSX/ARIMAX;
ARIMA;
Occurrence model for intermittent demand forecasting;
Scale model for ADAM;
Simulation functions;
Model diagnostics;
CES, MSARIMA, SSARIMA, GUM, and SMA – functions that are available in R and not yet ported to Python.

So, lots of work to do. I am sure we will be quite busy well into 2026.

Summary

It has been a long and winding road, but Filotas and Leo did an amazing job to make this happen. The existing ETS implementation in smooth already works quite well and quite fast. It does not fail as some other implementations do, and it is quite reliable. I have actually spent many years testing the R version on different time series to make sure that it produces something sensible no matter what. The code was translated to Python one-to-one, so I am fairly confident that the function will work as expected in 99.9% of cases (there is always a non-zero probability that something will go wrong). Both ADAM and ES already support a variety of features that you might find useful.

One thing I will kindly ask of you is that if you find a bug or an issue when running experiments on your datasets, please file it in our GitHub repo here – we will try to find the time to fix it. Also, if you would like to contribute by translating some features from R to Python or implementing something additional, please get in touch with me.

Finally, I am always glad to hear success stories. If you find the smooth package useful in your work, please let us know. One way of doing that is via the Discussions on GitHub, or you can simply send me an email.

Message smooth forecasting with the smooth package in Python first appeared on Open Forecast.

smooth v4.4.0

Ivan Svetunkov — Mon, 09 Feb 2026 09:02:21 +0000

Great news, everyone! smooth package for R version 4.4.0 is now on CRAN. Why is this a great news? Let me explain!

On this page:

What’s new?
Evaluation

Setup
Results

What’s next?

Here is what’s new since 4.3.0:

First, I have worked on tuning the initialisation in adam() in case of backcasting, and improved the msdecompose() function a bit to get more robust results. This was necessary to make sure that when the smoothing parameters are close to zero, initial values would still make sense. This is already in adam (use smoother="global" to test), but will become the default behaviour in the next version of the package, when we iron everything out. This is all a part of a larger work with Kandrika Pritularga on a paper about the initialisation of dynamic models.

Second, I have fixed a long standing issue of the eigenvalues calculation inside the dynamic models, which is applicable only in case of bounds="admissible" and might impact ARIMA, CES and GUM. The parameter restriction are now done consistently across all functions, guaranteeing that they will not fail and will produce stable/invertible estimates of parameters.

Third, I have added the Sparse ARMA function, which constructs ARMA(p,q) of the specific orders, dropping all the elements from 1 to those. e.g. SpARMA(2,3) would have the following form:
\begin{equation*}
y_t = \phi_2 y_{t-2} + \theta_3 \epsilon_{t-3} + \epsilon_{t}
\end{equation*}
This weird model is needed for a project I am working on together with Devon Barrow, Nikos Kourentzes and Yves Sagaert. I’ll explain more when we get the final draft of the paper.

And something very important, which you will not notice: I refactored the C++ code in the package so that it is available not only for R, but also for Python… Why? I’ll explain in the next post :). But this also means that the old functions that relied on the previous generation of the C++ code are now discontinued, and all the smooth functions use the new core. This applies to es(), ssarima(), msarima(), ces(), gum() and sma(). You will not notice any change, except that some of them should become a bit faster and probably more robust. And this also means that all of them will now be able to use methods for the adam() function. For example, the summary() will produce the proper output with standard errors and confidence intervals for all estimated parameters.

Evaluation

DISCLAIMER: The previous evaluation was for smooth v4.3.0, you can find it here. I have changed one of error measures (sCE to SAME), but the rest is the same, so the results are widely comparable between the versions.

The setup

As usual, in situations like this, I have run the evaluation on the M1, M3 and Tourism competition data. This time, I have added more flavours of the ETS model selection so that you can see how the models pool impacts the forecasting accuracy. Short description:

XXX – select between pure additive ETS models only;
ZZZ – select from the pool of all 30 models, but use branch-and-bound to kick out the less suitable models;
ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
FFF – select from the pool of all 30 models (exhaustive search);
SXS – the pool of models that is used by default in ets() from the forecast package in R.

I also tested three types of the ETS initialisation:

Back – initial="backcasting"
Opt – initial="optimal"
Two – initial="two-stage"

Backcasting is now the default method of initialisation, and does well in many cases, but I found that optimal initials (if done correctly) help in some difficult situations, as long a you have enough of computational time.

I used two error measures and computational time to check how functions work. The first error measure is called RMSSE (Root Mean Squared Scaled Error) from M5 competition, motivated by Athanasopoulos & Kourentzes (2023):

\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.

The second measure does not have a standard name in the literature, but the idea of it is to the measure the bias of forecasts and to get rid of the sign to make sure that positively biased forecasts on some time series are not cancelled out by the negative ones on the other ones. I call this measure “Scaled Absolute Mean Error” (SAME):

\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).

For both of these measures, the lower value is better than the higher one. As for the computational time, I have measured it for each model and each series, and this time I provided distribution of times to better see how methods perform.

Boring code in R

library(Mcomp)
library(Tcomp)
library(forecast)
library(smooth)

library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
        holdout <- as.vector(holdout);
        insample <- as.vector(insample);
	# RMSSE and SAME are defined in greybox v2.0.7
        return(c(RMSSE(holdout, object$mean, mean(diff(insample^2)),
                 SAME(holdout, object$mean, mean(abs(diff(insample)))),
                 object$timeElapsed))
}

datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)

# Method configuration list
# Each method specifies: fn (function name), pkg (package), model, initial,
methodsConfig <- list(
	# ETS and Auto ARIMA from the forecast package in R
	"ETS" = list(fn = "ets", pkg = "forecast", use_x_only = TRUE),
	"Auto ARIMA" = list(fn = "auto.arima", pkg = "forecast", use_x_only = TRUE),
	# ADAM with different initialisation schemes
	"ADAM ETS Back" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ADAM ETS Opt" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ADAM ETS Two" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "two"),
	# ES, which is a wrapper of ADAM. Should give very similar results to ADAM on regular data
	"ES Back" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ES Opt" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ES Two" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "two"),
	# Several flavours for model selection in ES
	"ES XXX" = list(fn = "es", pkg = "smooth", model = "XXX", initial = "back"),
	"ES ZZZ" = list(fn = "es", pkg = "smooth", model = "ZZZ", initial = "back"),
	"ES FFF" = list(fn = "es", pkg = "smooth", model = "FFF", initial = "back"),
	"ES SXS" = list(fn = "es", pkg = "smooth", model = "SXS", initial = "back"),
	# ARIMA implementations in smooth
	"MSARIMA" = list(fn = "auto.msarima", pkg = "smooth", initial = "back"),
	"SSARIMA" = list(fn = "auto.ssarima", pkg = "smooth", initial = "back"),
	# Complex Exponential Smoothing
	"CES" = list(fn = "auto.ces", pkg = "smooth", initial = "back"),
	# Generalised Univeriate Model (experimental)
	"GUM" = list(fn = "auto.gum", pkg = "smooth", initial = "back")
)

methodsNames <- names(methodsConfig)
methodsNumber <- length(methodsNames)

measuresNames <- c("RMSSE","SAME","Time")
measuresNumber <- length(measuresNames)

testResults <- array(NA, c(methodsNumber, datasetLength, measuresNumber),
                     dimnames = list(methodsNames, NULL, measuresNames))

# Unified loop over all methods
for(j in seq_along(methodsConfig)){
	cfg <- methodsConfig[[j]]
	cat("Running method:", methodsNames[j], "\n")

	result <- foreach(i = 1:datasetLength, .combine = "cbind",
	                  .packages = c("smooth", "forecast")) %dopar% {
		startTime <- Sys.time()

		# Build model call based on method type
		if(isTRUE(cfg$use_x_only)){
			# forecast package methods: ets, auto.arima
			test <- do.call(cfg$fn, list(datasets[[i]]$x))
		}else if(cfg$fn %in% c("adam", "es")) {
			# adam and es take dataset and model
			test <- do.call(cfg$fn, list(datasets[[i]], model=cfg$model, initial = cfg$initial))
		}else{
			# auto.msarima, auto.ssarima, auto.ces, auto.gum
			test <- do.call(cfg$fn, list(datasets[[i]], initial = cfg$initial))
		}

		# Build forecast call
		forecast_args <- list(test, h = datasets[[i]]$h)
		testForecast <- do.call(forecast, forecast_args)
		testForecast$timeElapsed <- Sys.time() - startTime

		return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x))
	}
	testResults[j,,] <- t(result)
}

Results

And here are the results for the smooth functions in v4.4.0 for R. First, we summarise the RMSSEs. I produce quartiles of distribution of RMSSE together with the mean.

cbind(t(apply(testResults[,,"RMSSE"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"RMSSE"],1,mean)) |> round(4)

                  0%    25%    50%    75%      100%   mean
ETS           0.0245 0.6772 1.1806 2.3765   51.6160 1.9697
Auto ARIMA    0.0246 0.6802 1.1790 2.3583   51.6160 1.9864
ADAM ETS Back 0.0183 0.6647 1.1620 2.3023   50.2585 1.9283
ADAM ETS Opt  0.0242 0.6714 1.1868 2.3623   51.6160 1.9432
ADAM ETS Two  0.0246 0.6690 1.1875 2.3374   51.6160 1.9480
ES Back       0.0183 0.6674 1.1647 2.3164   50.2585 1.9292
ES Opt        0.0242 0.6740 1.1858 2.3644   51.6160 1.9469
ES Two        0.0245 0.6717 1.1874 2.3463   51.6160 1.9538
ES XXX        0.0183 0.6777 1.1708 2.3062   50.2585 1.9613
ES ZZZ        0.0108 0.6682 1.1816 2.3611  201.4959 2.0841
ES FFF        0.0145 0.6795 1.2170 2.4575 5946.1858 3.3033
ES SXS        0.0183 0.6754 1.1709 2.3539   50.2585 1.9448
MSARIMA       0.0278 0.6988 1.1898 2.4208   51.6160 2.0750
SSARIMA       0.0277 0.7371 1.2544 2.4425   51.6160 2.0625
CES Back      0.0450 0.6761 1.1741 2.3205   51.0571 1.9650
GUM Back      0.0333 0.7077 1.2073 2.4533   51.6184 2.0461

The worst performing models are the ETS with the multiplicative trend (ES ZZZ and ES FFF). This is because there are outliers in some time series, and the multiplicative trend reacts to them by amending the trend value to something large (e.g. 2, i.e. twice increase in level for each step), and then can never return to a reasonable level (see explanation of this phenomenon in Section 6.6 of ADAM book). As expected, ADAM ETS does very similar to the ES, and we can see that the default initialisation (backcasting) is pretty good in terms of RMSSE values. To be fair, if the models are tested on a different dataset, it might be the case that the optimal initialisation would do better.

Here is a table with the SAME results:

cbind(t(apply(testResults[,,"SAME"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"SAME"],1,mean)) |> round(4)

                 0%    25%    50%    75%      100%   mean
ETS           8e-04 0.3757 1.0203 2.5097   54.6872 1.9983
Auto ARIMA    0e+00 0.3992 1.0429 2.4565   53.2710 2.0446
ADAM ETS Back 1e-04 0.3752 0.9965 2.4047   52.3418 1.9518
ADAM ETS Opt  5e-04 0.3733 1.0212 2.4848   55.1018 1.9618
ADAM ETS Two  8e-04 0.3780 1.0316 2.4511   55.1019 1.9712
ES Back       0e+00 0.3733 0.9945 2.4122   53.4504 1.9485
ES Opt        2e-04 0.3727 1.0255 2.4756   54.6860 1.9673
ES Two        1e-04 0.3855 1.0323 2.4535   54.6856 1.9799
ES XXX        1e-04 0.3733 1.0050 2.4257   53.1697 1.9927
ES ZZZ        3e-04 0.3824 1.0135 2.4885  229.7626 2.1376
ES FFF        3e-04 0.3972 1.0489 2.6042 3748.4268 2.9501
ES SXS        6e-04 0.3750 1.0125 2.4627   53.4504 1.9725
MSARIMA       1e-04 0.3960 1.0094 2.5409   54.7916 2.1227
SSARIMA       1e-04 0.4401 1.1222 2.5673   52.5023 2.1248
CES Back      6e-04 0.3767 1.0079 2.4085   54.9026 2.0052
GUM Back      0e+00 0.3803 1.0575 2.6259   63.0637 2.0858

In terms of bias, smooth implementations of ETS are doing well again, and we can see the same issue with the multiplicative trend here as before. Another thing to note is that MSARIMA and SSARIMA are not as good as the Auto ARIMA from the forecast package on these datasets in terms of RMSSE and SAME (at least, in terms of mean error measures). And actually, GUM and CES are now better than those in terms of both error measures.

Finally, here is a table with the computational time:

cbind(t(apply(testResults[,,"Time"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"Time"],1,mean)) |> round(4)

                  0%    25%    50%     75%    100%   mean
ETS           0.0032 0.0117 0.1660  0.6728  1.6400 0.3631
Auto ARIMA    0.0100 0.1184 0.3618  1.0548 54.3652 1.4760
ADAM ETS Back 0.0162 0.1062 0.1854  0.4022  2.5109 0.2950
ADAM ETS Opt  0.0319 0.1920 0.3103  0.6792  3.8933 0.5368
ADAM ETS Two  0.0427 0.2548 0.4035  0.8567  3.7178 0.6331
ES Back       0.0153 0.0896 0.1521  0.3335  2.1128 0.2476
ES Opt        0.0303 0.1667 0.2565  0.5910  3.5887 0.4522
ES Two        0.0483 0.2561 0.4016  0.8626  3.5892 0.6309
MSARIMA Back  0.0614 0.3418 0.6947  0.9868  3.9677 0.7534
SSARIMA Back  0.0292 0.2963 0.8988  2.1729 13.7635 1.6581
CES Back      0.0146 0.0400 0.1834  0.2298  1.2099 0.1713
GUM Back      0.0165 0.2101 1.5221  3.0543  9.5380 1.9506

# Separate table for special pools of ETS.
# The time is proportional to the number of models here
=========================================================
                  0%    25%    50%     75%    100%   mean
ES XXX        0.0114 0.0539 0.0782  0.1110  0.8163 0.0859
ES ZZZ        0.0147 0.1371 0.2690  0.4947  2.2049 0.3780
ES FFF        0.0529 0.2775 1.1539  1.5926  3.8552 1.1231
ES SXS        0.0323 0.1303 0.4491  0.6013  2.2170 0.4581

I have manually moved the specific ES model pools flavours below because there is no point in comparing their computational time with the time of the others (they have different pools of models and thus are not really comparable with the rest).

What we can see from this, is that the ES with backcasting is faster in comparison with the other models in this setting (in terms of mean and median computational time). CES is very fast in terms of mean computational time, which is probably because of the very short pool of models to choose from (only four). SSARIMA is pretty slow, which is due to the nature of its order selection algorithm (I don't plan to update it any time soon, but if someone wants to contribute - let me know). But the interesting thing is that Auto ARIMA, while being relatively fine in terms of median time, has the highest maximum one, meaning that for some time series, it failed for some unknown reason. The series that caused the biggest issue for Auto ARIMA is N389 from the M1 competition. I'm not sure what the issue was, and I don't have time to investigate this.

Mean computational time vs mean RMSSE

Comparing the mean computational time with mean RMSSE value (image above), it looks like the overall tendency in the smooth + forecast functions for the M1, M3 and Tourism datasets is that additional computational time does not improve the accuracy. But it also looks like a simpler pool of pure additive models (ETS(X,X,X)) harms the accuracy in comparison with the branch-and-bound based one of the default model="ZXZ". There seems to be a sweet spot in terms of the pool of models to choose from (no multiplicative trend, allow mixed models). This aligns well with the papers of Petropoulos et al. (2025), who investigated the accuracy of arbitrary short pools of models and Kourentzes et al. (2019), who showed how pooling (if done correctly) can improve the accuracy on average.

What's next?

For R, the main task now is to rewrite the oes() function and substitute it with the om() one - "Occurrence Model". This should be equivalent to adam() in functionality, allowing to introduce ETS, ARIMA and explanatory variables for the occurrence part of the model. This is a huge work, which I hope to progress slowly throughout the 2026 and finish by the end of the year. Doing that will also allow me removing the last bits of the old C++ code and switch to the ADAM core completely, introducing more functionality for capturing patterns on intermittent demand. The minor task, is to test the smoother="global" more for the ETS initialisation and roll it out as the default in the next release for both R and Python.

For Python,... What Python? Ah! You'll see soon :)

Message smooth v4.4.0 first appeared on Open Forecast.

ITISE2025: Beyond summary performance metrics for forecast selection and combination

Ivan Svetunkov — Mon, 21 Jul 2025 10:11:48 +0000

This year, I couldn’t attend the International Symposium on Forecasting (organised by the International Institute of Forecasters), which I usually do, so instead I went to Gran Canaria for the International Conference on Time Series and Forecasting (aka ITISE). The location was fantastic, and I enjoyed several talks. I was also glad to catch up and spend time with my friends and colleagues Juan Trapero, Devon Barrow, Kostas Nikolopoulos, Vasilios Bougakis, Livio Fenga, and Vittorio Maniezzo, all of whom delivered great presentations.

As for my contribution, I presented a paper that Nikos Kourentzes and I have been working on since around 2018. It focuses on pooling using point information criteria. The core idea is to combine forecasts based on a smaller pool of models, which we propose creating by comparing the distributions of information criteria across forecasting models. We’re planning to finish a new version of the paper by September and submit it to a peer-reviewed journal. I’ll share more details when the draft that I can share is ready. In the meantime, you can check out the slides that summarise the main points of the paper. Here they are.

Message ITISE2025: Beyond summary performance metrics for forecast selection and combination first appeared on Open Forecast.

smooth v4.3.0 in R: what’s new and what’s next?

Ivan Svetunkov — Fri, 04 Jul 2025 10:02:17 +0000

Good news! The smooth package v4.3.0 is now on CRAN. And there are several things worth mentioning, so I have written this post.

New default initialisation mechanism

Since the beginning of the package, the smooth functions supported three ways for initialising the state vector (the vector that includes level, trend, seasonal indices): optimisation, backcasting and values provided by user. The former has been considered the standard way of estimating ETS, while the backcasting was originally proposed by Box & Jenkins (1970) and was only implemented in the smooth (at least, I haven’t seen it anywhere else). The main advantage of the latter is in computational time, because you do not need to estimate every single value of the state vector. The new ADAM core that I developed during COVID lockdown, had some improvements for the backcasting, and I noticed that adam() produced more accurate forecasts with it than with the optimisation. But I needed more testing, so I have not changed anything back then.

However, my recent work with Kandrika Pritularga on capturing uncertainty in ETS, have demonstrated that backcasting solves some fundamental problems with the variance of states – the optimisation cannot handle so many parameters, and asymptotic properties of ETS do not make sense in that case (we’ll release the paper as soon as we finish the experiments). So, with this evidence on hands and additional tests, I have made a decision to switch from the optimisation to backcasting as the default initialisation mechanism for all the smooth functions.

The final users should not feel much difference, but it should work faster now and (hopefully) more accurately. If this is not the case, please get in touch or file an issue on github.

Also, rest assured the initial="optimal" is available and will stay available as an option in all the smooth functions, so, you can always switch back to it if you don’t like backcasting.

Finally, I have introduce a new initialisation mechanism called “two-stage”, the idea of which is to apply backcasting first and then to optimise the obtained state values. It is slower, but is supposed to be better than the standard optimisation.

ADAM core

Every single function in the smooth package now uses ADAM C++ core, and the old core will be discontinued starting from v4.5.0 of the package. This applies to the functions: es(), ssarima(), msarima(), ces(), gum(), sma(). There are now the legacy versions of these functions in the package with the prefix “_old” (e.g. es_old()), which will be removed in the smooth v4.5.0. The new engine also helped ssarima(), which now became slightly more accurate than before. Unfortunately, there are still some issues with the initialisation of the seasonal ssarima(), which I have failed to solve completely. But I hope that over time this will be resolved as well.

smooth performance update

I have applied all the smooth functions together with the ets() and auto.arima() from the forecast package to the M1, M3 and Tourism competition data and have measured their performances in terms of RMSSE, scaled Cumulative Error (sCE) and computational time. I used the following R code for that:

Long and boring code in R

library(Mcomp)
library(Tcomp)

library(forecast)
library(smooth)

# I work on Linux and use doMC. Substitute this with doParallel if you use Windows
library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
	holdout <- as.vector(holdout);
	insample <- as.vector(insample);
	return(c(measures(holdout, object$mean, insample),
			 mean(holdout < object$upper & holdout > object$lower),
			 mean(object$upper-object$lower)/mean(insample),
			 pinball(holdout, object$upper, 0.975)/mean(insample),
			 pinball(holdout, object$lower, 0.025)/mean(insample),
			 sMIS(holdout, object$lower, object$upper, mean(insample),0.95),
			 object$timeElapsed))
}

# Datasets to use
datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)
# Types of models to try
methodsNames <- c("ETS", "Auto ARIMA",
				  "ADAM ETS Back", "ADAM ETS Opt", "ADAM ETS Two",
				  "ES Back", "ES Opt", "ES Two",
				  "ADAM ARIMA Back", "ADAM ARIMA Opt", "ADAM ARIMA Two",
				  "MSARIMA Back", "MSARIMA Opt", "MSARIMA Two",
				  "SSARIMA Back", "SSARIMA Opt", "SSARIMA Two",
				  "CES Back", "CES Opt", "CES Two",
				  "GUM Back", "GUM Opt", "GUM Two");
methodsNumber <- length(methodsNames);
test <- adam(datasets[[125]]);

testResults20250603 <- array(NA,c(methodsNumber,datasetLength,length(test$accuracy)+6),
                             dimnames=list(methodsNames, NULL,
                                           c(names(test$accuracy),
                                             "Coverage","Range",
                                             "pinballUpper","pinballLower","sMIS",
                                             "Time")));

#### ETS from forecast package ####
j <- 1;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
  startTime <- Sys.time()
  test <- ets(datasets[[i]]$x);
  testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### AUTOARIMA ####
j <- 2;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.arima(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Backcasting ####
j <- 3;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Optimal ####
j <- 4;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Two-stage ####
j <- 5;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Backcasting ####
j <- 6;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Optimal ####
j <- 7;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Two-stage ####
j <- 8;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Backcasting ####
j <- 9;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="back", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Optimal ####
j <- 10;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="opt", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Two-stage ####
j <- 11;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="two", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Backcasting ####
j <- 12;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Optimal ####
j <- 13;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Two-stage ####
j <- 14;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Backcasting ####
j <- 15;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ssarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Optimal ####
j <- 16;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="opt");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Two-stage ####
j <- 17;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="two");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Backcasting ####
j <- 18;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Optimal ####
j <- 19;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Two-stage ####
j <- 20;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Backcasting ####
j <- 21;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Optimal ####
j <- 22;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Two-stage ####
j <- 23;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

# Summary of results
cbind(t(apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,quantile)),
mean=apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,mean),
sCE=apply(testResults20250603[c(1:8,12:23),,"sCE"],1,mean),
Time=apply(testResults20250603[c(1:8,12:23),,"Time"],1,mean)) |> round(3)

The table below shows the distribution of RMSSE, the mean sCE and mean Time. The boldface shows the best performing model.

                    min   Q1  median   Q3     max  mean    sCE  Time
ETS                0.024 0.677 1.181 2.376  51.616 1.970  0.299 0.385
Auto ARIMA         0.025 0.680 1.179 2.358  51.616 1.986  0.124 1.467

ADAM ETS Back      0.015 0.666 1.175 2.276  51.616 1.921  0.470 0.218
ADAM ETS Opt       0.020 0.666 1.190 2.311  51.616 1.937  0.299 0.432
ADAM ETS Two       0.025 0.666 1.179 2.330  51.616 1.951  0.330 0.579

ES Back            0.015 0.672 1.174 2.284  51.616 1.921  0.464 0.219
ES Opt             0.020 0.672 1.186 2.316  51.616 1.943  0.302 0.497
ES Two             0.024 0.668 1.181 2.346  51.616 1.952  0.346 0.562

MSARIMA Back       0.025 0.710 1.188 2.383  51.616 2.028  0.067 0.780
MSARIMA Opt        0.025 0.724 1.242 2.489  51.616 2.083  0.088 1.905
MSARIMA Two        0.025 0.718 1.250 2.485  51.906 2.075  0.083 2.431

SSARIMA Back       0.045 0.738 1.248 2.383  51.616 2.063  0.167 1.747
SSARIMA Opt        0.025 0.774 1.292 2.413  51.616 2.040  0.178 7.324
SSARIMA Two        0.025 0.742 1.241 2.414  51.616 2.027  0.183 8.096

CES Back           0.046 0.695 1.189 2.355  51.342 1.981  0.125 0.185
CES Opt            0.030 0.698 1.218 2.327  49.480 2.001 -0.135 0.834
CES Two            0.025 0.696 1.207 2.343  51.242 1.993 -0.078 1.006

GUM Back           0.046 0.707 1.215 2.399  51.134 2.049 -0.285 3.575
GUM Opt            0.026 0.795 1.381 2.717 240.143 2.932 -0.549 4.668
GUM Two            0.026 0.803 1.406 2.826 240.143 3.041 -0.593 4.703

Several notes:

ES is a wrapper of ADAM ETS. The main difference between them is that the latter uses the Gamma distribution for the multiplicative error models, while the former relies on the Normal one.
MSARIMA is a wrapper for ADAM ARIMA, which is why I don't report the latter in the results.

One thing you can notice from the output above, is that the models with backcasting consistently produce more accurate forecasts across all measures. I explain this with the idea that they tend not to overfit the data as much as the optimal initialisation does.

To see the stochastic dominance of the forecasting models, I conducted the modification of the MCB/Nemenyi test, explained in this post:

par(mar=c(10,3,4,1))
greybox::rmcb(t(testResults20250603[c(1:8,12:23),,"RMSSE"]), outplot="mcb")

Nemenyi test for the smooth functions

The image shows mean ranks for each of the models and whether the performance of those is significant on the 5% level or not. It is apparent that ADAM ETS has the lowest rank, no matter what the initialisation is used, but its performance does not differ significantly from the es(), ets() and auto.arima(). Also, auto.arima() significantly outperforms msarima() and ssarima() on this data, which could be due to their initialisation. Still, backcasting seems to help all the functions in terms of accuracy in comparison with the "optimal" and "two-stage" initials.

What's next?

I am now working on a modified formulation for ETS, which should fix some issues with the multiplicative trend and make the ETS safer. This is based on Section 6.6 of the online version of the ADAM monograph (it is not in the printed version). I am not sure whether this will improve the accuracy further, but I hope that it will make some of the ETS models more resilient than they are right now. I specifically need the multiplicative trend model, which sometimes behave like crazy due to its formulation.

I also plan to translate all the simulation functions to the ADAM core. This applies to sim.es(), sim.ssarima(), sim.gum() and sim.ces(). Currently they rely on the older one, and I want to get rid of it. Having said that, the method simulate() applied to the new smooth functions already uses the new core. It just lacks the flexibility that the other functions have.

Furthermore, I want to rewrite the oes() function and substitute it with oadam(), which would use a better engine, supporting more features, such as multiple frequencies and ARIMA for the occurrence. This is a lot of work, and I probably will need help with that.

Finally, Filotas Theodosiou, Leonidas Tsaprounis, and I are working on the translation of the R code of the smooth to Python. You can read a bit more about this project here. There are several other people who decided to help us, but the progress so far has been a bit slow, because of the code translation. If you want to help, please get in touch.

Message smooth v4.3.0 in R: what’s new and what’s next? first appeared on Open Forecast.

Multistep loss functions: Geometric Trace MSE

Ivan Svetunkov — Tue, 04 Jun 2024 09:05:56 +0000

While there is a lot to say about multistep losses, I’ve decided to write the final post on one of them and leave the topic alone for a while. Here it goes.

Last time, we discussed MSEh and TMSE, and I mentioned that both of them impose shrinkage and have some advantages and disadvantages. One of the main advantages of TMSE was in reducing computational time in comparison with MSEh: you just fit one model with it instead of doing it h times. However, the downside of TMSE is that it averages things out, and we end up with model parameters that minimize the h-steps-ahead forecast error to a much larger extent than those that are close to the one-step-ahead. For example, if the one-step-ahead MSE was 500, while the six-steps-ahead MSE was 3000, the impact of the latter in TMSE would be six times higher than that of the former, and the estimator would prioritize the minimization of the longer horizon one.

A more balanced version of this was introduced in our paper and was called “Geometric Trace MSE” (GTMSE). The main idea of GTMSE is to take the geometric mean or, equivalently, the sum of logarithms of MSEh instead of taking the arithmetic mean. Because of that, the impact of MSEh on the loss becomes comparable with the effect of MSE1, and the model performs well throughout the whole horizon from 1 to h. For the same example of MSEs as above, the logarithm of 500 is approximately 2.7, while the logarithm of 3000 is 3.5. The difference between the two is much smaller, reducing the impact of the long-term forecast uncertainty. As a result, GTMSE has the following features:

It imposes shrinkage on models parameters.
The strength of shrinkage is proportional to the forecast horizon.
But it is much milder than in case of MSEh or TMSE.
It leads to more balanced forecasts, performing well on average across the whole horizon.

In that paper, we did extensive simulations to see how different estimators behave, and we found that:

If an analyst is interested in parameters of models, they should stick with the conventional loss functions (based on one-step-ahead forecast error) because the multistep ones tend to produce biased estimates of parameters.
On the other hand, multistep losses kick off the redundant parameters faster than the conventional one, so there might be a benefit in the case of overparameterized models.
At the same time, if forecasting is of the main interest, then multistep losses might bring benefits, especially on larger samples.

ETS(A,A,A) estimated using different loss functions applied to the data with multiplicative seasonality

The image above shows an example from our paper, where we applied the additive model to the data, which exhibits apparent multiplicative seasonality. Despite that, we can see that multistep losses did a much better job than the conventional MSE, compensating for the misspecification.

Message Multistep loss functions: Geometric Trace MSE first appeared on Open Forecast.