Great news, everyone! smooth package for R version 4.4.0 is now on CRAN. Why is this a great news? Let me explain!
On this page:
Here is what’s new since 4.3.0:
First, I have worked on tuning the initialisation in adam() in case of backcasting, and improved the msdecompose() function a bit to get more robust results. This was necessary to make sure that when the smoothing parameters are close to zero, initial values would still make sense. This is already in adam (use smoother="global" to test), but will become the default behaviour in the next version of the package, when we iron everything out. This is all a part of a larger work with Kandrika Pritularga on a paper about the initialisation of dynamic models.
Second, I have fixed a long standing issue of the eigenvalues calculation inside the dynamic models, which is applicable only in case of bounds="admissible" and might impact ARIMA, CES and GUM. The parameter restriction are now done consistently across all functions, guaranteeing that they will not fail and will produce stable/invertible estimates of parameters.
Third, I have added the Sparse ARMA function, which constructs ARMA(p,q) of the specific orders, dropping all the elements from 1 to those. e.g. SpARMA(2,3) would have the following form:
\begin{equation*}
y_t = \phi_2 y_{t-2} + \theta_3 \epsilon_{t-3} + \epsilon_{t}
\end{equation*}
This weird model is needed for a project I am working on together with Devon Barrow, Nikos Kourentzes and Yves Sagaert. I’ll explain more when we get the final draft of the paper.
And something very important, which you will not notice: I refactored the C++ code in the package so that it is available not only for R, but also for Python… Why? I’ll explain in the next post :). But this also means that the old functions that relied on the previous generation of the C++ code are now discontinued, and all the smooth functions use the new core. This applies to es(), ssarima(), msarima(), ces(), gum() and sma(). You will not notice any change, except that some of them should become a bit faster and probably more robust. And this also means that all of them will now be able to use methods for the adam() function. For example, the summary() will produce the proper output with standard errors and confidence intervals for all estimated parameters.
Evaluation
DISCLAIMER: The previous evaluation was for smooth v4.3.0, you can find it here. I have changed one of error measures (sCE to SAME), but the rest is the same, so the results are widely comparable between the versions.
The setup
As usual, in situations like this, I have run the evaluation on the M1, M3 and Tourism competition data. This time, I have added more flavours of the ETS model selection so that you can see how the models pool impacts the forecasting accuracy. Short description:
1. XXX – select between pure additive ETS models only;
2. ZZZ – select from the pool of all 30 models, but use branch-and-bound to kick out the less suitable models;
3. ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
4. FFF – select from the pool of all 30 models (exhaustive search);
5. SXS – the pool of models that is used by default in ets() from the forecast package in R.
I also tested three types of the ETS initialisation:
1. Back – initial="backcasting"
2. Opt – initial="optimal"
3. Two – initial="two-stage"
Backcasting is now the default method of initialisation, and does well in many cases, but I found that optimal initials (if done correctly) help in some difficult situations, as long a you have enough of computational time.
I used two error measures and computational time to check how functions work. The first error measure is called RMSSE (Root Mean Squared Scaled Error) from Athanasopoulos & Kourentzes (2023):
\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.
The second measure does not have a standard name in the literature, but the idea of it is to the measure the bias of forecasts and to get rid of the sign to make sure that positively biased forecasts on some time series are not cancelled out by the negative ones on the other ones. I call this measure “Scaled Absolute Mean Error” (SAME):
\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).
For both of these measures, the lower value is better than the higher one. As for the computational time, I have measured it for each model and each series, and this time I provided distribution of times to better see how methods perform.
Results
And here are the results for the smooth functions in v4.4.0 for R. First, we summarise the RMSSEs. I produce quartiles of distribution of RMSSE together with the mean.
cbind(t(apply(testResults[,,"RMSSE"],1,quantile, na.rm=T)),
mean=apply(testResults[,,"RMSSE"],1,mean)) |> round(4)
0% 25% 50% 75% 100% mean ETS 0.0245 0.6772 1.1806 2.3765 51.6160 1.9697 Auto ARIMA 0.0246 0.6802 1.1790 2.3583 51.6160 1.9864 ADAM ETS Back 0.0183 0.6647 1.1620 2.3023 50.2585 1.9283 ADAM ETS Opt 0.0242 0.6714 1.1868 2.3623 51.6160 1.9432 ADAM ETS Two 0.0246 0.6690 1.1875 2.3374 51.6160 1.9480 ES Back 0.0183 0.6674 1.1647 2.3164 50.2585 1.9292 ES Opt 0.0242 0.6740 1.1858 2.3644 51.6160 1.9469 ES Two 0.0245 0.6717 1.1874 2.3463 51.6160 1.9538 ES XXX 0.0183 0.6777 1.1708 2.3062 50.2585 1.9613 ES ZZZ 0.0108 0.6682 1.1816 2.3611 201.4959 2.0841 ES FFF 0.0145 0.6795 1.2170 2.4575 5946.1858 3.3033 ES SXS 0.0183 0.6754 1.1709 2.3539 50.2585 1.9448 MSARIMA 0.0278 0.6988 1.1898 2.4208 51.6160 2.0750 SSARIMA 0.0277 0.7371 1.2544 2.4425 51.6160 2.0625 CES Back 0.0450 0.6761 1.1741 2.3205 51.0571 1.9650 GUM Back 0.0333 0.7077 1.2073 2.4533 51.6184 2.0461
The worst performing models are the ETS with the multiplicative trend (ES ZZZ and ES FFF). This is because there are outliers in some time series, and the multiplicative trend reacts to them by amending the trend value to something large (e.g. 2, i.e. twice increase in level for each step), and then can never return to a reasonable level (see explanation of this phenomenon in Section 6.6 of ADAM book). As expected, ADAM ETS does very similar to the ES, and we can see that the default initialisation (backcasting) is pretty good in terms of RMSSE values. To be fair, if the models are tested on a different dataset, it might be the case that the optimal initialisation would do better.
Here is a table with the SAME results:
cbind(t(apply(testResults[,,"SAME"],1,quantile, na.rm=T)),
mean=apply(testResults[,,"SAME"],1,mean)) |> round(4)
0% 25% 50% 75% 100% mean ETS 8e-04 0.3757 1.0203 2.5097 54.6872 1.9983 Auto ARIMA 0e+00 0.3992 1.0429 2.4565 53.2710 2.0446 ADAM ETS Back 1e-04 0.3752 0.9965 2.4047 52.3418 1.9518 ADAM ETS Opt 5e-04 0.3733 1.0212 2.4848 55.1018 1.9618 ADAM ETS Two 8e-04 0.3780 1.0316 2.4511 55.1019 1.9712 ES Back 0e+00 0.3733 0.9945 2.4122 53.4504 1.9485 ES Opt 2e-04 0.3727 1.0255 2.4756 54.6860 1.9673 ES Two 1e-04 0.3855 1.0323 2.4535 54.6856 1.9799 ES XXX 1e-04 0.3733 1.0050 2.4257 53.1697 1.9927 ES ZZZ 3e-04 0.3824 1.0135 2.4885 229.7626 2.1376 ES FFF 3e-04 0.3972 1.0489 2.6042 3748.4268 2.9501 ES SXS 6e-04 0.3750 1.0125 2.4627 53.4504 1.9725 MSARIMA 1e-04 0.3960 1.0094 2.5409 54.7916 2.1227 SSARIMA 1e-04 0.4401 1.1222 2.5673 52.5023 2.1248 CES Back 6e-04 0.3767 1.0079 2.4085 54.9026 2.0052 GUM Back 0e+00 0.3803 1.0575 2.6259 63.0637 2.0858
In terms of bias, smooth implementations of ETS are doing well again, and we can see the same issue with the multiplicative trend here as before. Another thing to note is that MSARIMA and SSARIMA are not as good as the Auto ARIMA from the forecast package on these datasets in terms of RMSSE and SAME (at least, in terms of mean error measures). And actually, GUM and CES are now better than those in terms of both error measures.
Finally, here is a table with the computational time:
cbind(t(apply(testResults[,,"Time"],1,quantile, na.rm=T)),
mean=apply(testResults[,,"Time"],1,mean)) |> round(4)
0% 25% 50% 75% 100% mean
ETS 0.0032 0.0117 0.1660 0.6728 1.6400 0.3631
Auto ARIMA 0.0100 0.1184 0.3618 1.0548 54.3652 1.4760
ADAM ETS Back 0.0162 0.1062 0.1854 0.4022 2.5109 0.2950
ADAM ETS Opt 0.0319 0.1920 0.3103 0.6792 3.8933 0.5368
ADAM ETS Two 0.0427 0.2548 0.4035 0.8567 3.7178 0.6331
ES Back 0.0153 0.0896 0.1521 0.3335 2.1128 0.2476
ES Opt 0.0303 0.1667 0.2565 0.5910 3.5887 0.4522
ES Two 0.0483 0.2561 0.4016 0.8626 3.5892 0.6309
MSARIMA Back 0.0614 0.3418 0.6947 0.9868 3.9677 0.7534
SSARIMA Back 0.0292 0.2963 0.8988 2.1729 13.7635 1.6581
CES Back 0.0146 0.0400 0.1834 0.2298 1.2099 0.1713
GUM Back 0.0165 0.2101 1.5221 3.0543 9.5380 1.9506
# Separate table for special pools of ETS.
# The time is proportional to the number of models here
=========================================================
0% 25% 50% 75% 100% mean
ES XXX 0.0114 0.0539 0.0782 0.1110 0.8163 0.0859
ES ZZZ 0.0147 0.1371 0.2690 0.4947 2.2049 0.3780
ES FFF 0.0529 0.2775 1.1539 1.5926 3.8552 1.1231
ES SXS 0.0323 0.1303 0.4491 0.6013 2.2170 0.4581
I have manually moved the specific ES model pools flavours below because there is no point in comparing their computational time with the time of the others (they have different pools of models and thus are not really comparable with the rest).
What we can see from this, is that the ES with backcasting is faster in comparison with the other models in this setting (in terms of mean and median computational time). CES is very fast in terms of mean computational time, which is probably because of the very short pool of models to choose from (only four). SSARIMA is pretty slow, which is due to the nature of its order selection algorithm (I don't plan to update it any time soon, but if someone wants to contribute - let me know). But the interesting thing is that Auto ARIMA, while being relatively fine in terms of median time, has the highest maximum one, meaning that for some time series, it failed for some unknown reason. The series that caused the biggest issue for Auto ARIMA is N389 from the M1 competition. I'm not sure what the issue was, and I don't have time to investigate this.
Comparing the mean computational time with mean RMSSE value (image above), it looks like the overall tendency in the smooth + forecast functions for the M1, M3 and Tourism datasets is that additional computational time does not improve the accuracy. But it also looks like a simpler pool of pure additive models (ETS(X,X,X)) harms the accuracy in comparison with the branch-and-bound based one of the default model="ZXZ". There seems to be a sweet spot in terms of the pool of models to choose from (no multiplicative trend, allow mixed models). This aligns well with the papers of Petropoulos et al. (2025), who investigated the accuracy of arbitrary short pools of models and Kourentzes et al. (2019), who showed how pooling (if done correctly) can improve the accuracy on average.
What's next?
For R, the main task now is to rewrite the oes() function and substitute it with the om() one - "Occurrence Model". This should be equivalent to adam() in functionality, allowing to introduce ETS, ARIMA and explanatory variables for the occurrence part of the model. This is a huge work, which I hope to progress slowly throughout the 2026 and finish by the end of the year. Doing that will also allow me removing the last bits of the old C++ code and switch to the ADAM core completely, introducing more functionality for capturing patterns on intermittent demand. The minor task, is to test the smoother="global" more for the ETS initialisation and roll it out as the default in the next release for both R and Python.
For Python,... What Python? Ah! You'll see soon :)
