Archives CES - Open Forecast

smooth v4.4.0

Ivan Svetunkov — Mon, 09 Feb 2026 09:02:21 +0000

Great news, everyone! smooth package for R version 4.4.0 is now on CRAN. Why is this a great news? Let me explain!

On this page:

What’s new?
Evaluation

Setup
Results

What’s next?

Here is what’s new since 4.3.0:

First, I have worked on tuning the initialisation in adam() in case of backcasting, and improved the msdecompose() function a bit to get more robust results. This was necessary to make sure that when the smoothing parameters are close to zero, initial values would still make sense. This is already in adam (use smoother="global" to test), but will become the default behaviour in the next version of the package, when we iron everything out. This is all a part of a larger work with Kandrika Pritularga on a paper about the initialisation of dynamic models.

Second, I have fixed a long standing issue of the eigenvalues calculation inside the dynamic models, which is applicable only in case of bounds="admissible" and might impact ARIMA, CES and GUM. The parameter restriction are now done consistently across all functions, guaranteeing that they will not fail and will produce stable/invertible estimates of parameters.

Third, I have added the Sparse ARMA function, which constructs ARMA(p,q) of the specific orders, dropping all the elements from 1 to those. e.g. SpARMA(2,3) would have the following form:
\begin{equation*}
y_t = \phi_2 y_{t-2} + \theta_3 \epsilon_{t-3} + \epsilon_{t}
\end{equation*}
This weird model is needed for a project I am working on together with Devon Barrow, Nikos Kourentzes and Yves Sagaert. I’ll explain more when we get the final draft of the paper.

And something very important, which you will not notice: I refactored the C++ code in the package so that it is available not only for R, but also for Python… Why? I’ll explain in the next post :). But this also means that the old functions that relied on the previous generation of the C++ code are now discontinued, and all the smooth functions use the new core. This applies to es(), ssarima(), msarima(), ces(), gum() and sma(). You will not notice any change, except that some of them should become a bit faster and probably more robust. And this also means that all of them will now be able to use methods for the adam() function. For example, the summary() will produce the proper output with standard errors and confidence intervals for all estimated parameters.

Evaluation

DISCLAIMER: The previous evaluation was for smooth v4.3.0, you can find it here. I have changed one of error measures (sCE to SAME), but the rest is the same, so the results are widely comparable between the versions.

The setup

As usual, in situations like this, I have run the evaluation on the M1, M3 and Tourism competition data. This time, I have added more flavours of the ETS model selection so that you can see how the models pool impacts the forecasting accuracy. Short description:

XXX – select between pure additive ETS models only;
ZZZ – select from the pool of all 30 models, but use branch-and-bound to kick out the less suitable models;
ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
FFF – select from the pool of all 30 models (exhaustive search);
SXS – the pool of models that is used by default in ets() from the forecast package in R.

I also tested three types of the ETS initialisation:

Back – initial="backcasting"
Opt – initial="optimal"
Two – initial="two-stage"

Backcasting is now the default method of initialisation, and does well in many cases, but I found that optimal initials (if done correctly) help in some difficult situations, as long a you have enough of computational time.

I used two error measures and computational time to check how functions work. The first error measure is called RMSSE (Root Mean Squared Scaled Error) from M5 competition, motivated by Athanasopoulos & Kourentzes (2023):

\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.

The second measure does not have a standard name in the literature, but the idea of it is to the measure the bias of forecasts and to get rid of the sign to make sure that positively biased forecasts on some time series are not cancelled out by the negative ones on the other ones. I call this measure “Scaled Absolute Mean Error” (SAME):

\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).

For both of these measures, the lower value is better than the higher one. As for the computational time, I have measured it for each model and each series, and this time I provided distribution of times to better see how methods perform.

Boring code in R

library(Mcomp)
library(Tcomp)
library(forecast)
library(smooth)

library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
        holdout <- as.vector(holdout);
        insample <- as.vector(insample);
	# RMSSE and SAME are defined in greybox v2.0.7
        return(c(RMSSE(holdout, object$mean, mean(diff(insample^2)),
                 SAME(holdout, object$mean, mean(abs(diff(insample)))),
                 object$timeElapsed))
}

datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)

# Method configuration list
# Each method specifies: fn (function name), pkg (package), model, initial,
methodsConfig <- list(
	# ETS and Auto ARIMA from the forecast package in R
	"ETS" = list(fn = "ets", pkg = "forecast", use_x_only = TRUE),
	"Auto ARIMA" = list(fn = "auto.arima", pkg = "forecast", use_x_only = TRUE),
	# ADAM with different initialisation schemes
	"ADAM ETS Back" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ADAM ETS Opt" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ADAM ETS Two" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "two"),
	# ES, which is a wrapper of ADAM. Should give very similar results to ADAM on regular data
	"ES Back" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ES Opt" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ES Two" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "two"),
	# Several flavours for model selection in ES
	"ES XXX" = list(fn = "es", pkg = "smooth", model = "XXX", initial = "back"),
	"ES ZZZ" = list(fn = "es", pkg = "smooth", model = "ZZZ", initial = "back"),
	"ES FFF" = list(fn = "es", pkg = "smooth", model = "FFF", initial = "back"),
	"ES SXS" = list(fn = "es", pkg = "smooth", model = "SXS", initial = "back"),
	# ARIMA implementations in smooth
	"MSARIMA" = list(fn = "auto.msarima", pkg = "smooth", initial = "back"),
	"SSARIMA" = list(fn = "auto.ssarima", pkg = "smooth", initial = "back"),
	# Complex Exponential Smoothing
	"CES" = list(fn = "auto.ces", pkg = "smooth", initial = "back"),
	# Generalised Univeriate Model (experimental)
	"GUM" = list(fn = "auto.gum", pkg = "smooth", initial = "back")
)

methodsNames <- names(methodsConfig)
methodsNumber <- length(methodsNames)

measuresNames <- c("RMSSE","SAME","Time")
measuresNumber <- length(measuresNames)

testResults <- array(NA, c(methodsNumber, datasetLength, measuresNumber),
                     dimnames = list(methodsNames, NULL, measuresNames))

# Unified loop over all methods
for(j in seq_along(methodsConfig)){
	cfg <- methodsConfig[[j]]
	cat("Running method:", methodsNames[j], "\n")

	result <- foreach(i = 1:datasetLength, .combine = "cbind",
	                  .packages = c("smooth", "forecast")) %dopar% {
		startTime <- Sys.time()

		# Build model call based on method type
		if(isTRUE(cfg$use_x_only)){
			# forecast package methods: ets, auto.arima
			test <- do.call(cfg$fn, list(datasets[[i]]$x))
		}else if(cfg$fn %in% c("adam", "es")) {
			# adam and es take dataset and model
			test <- do.call(cfg$fn, list(datasets[[i]], model=cfg$model, initial = cfg$initial))
		}else{
			# auto.msarima, auto.ssarima, auto.ces, auto.gum
			test <- do.call(cfg$fn, list(datasets[[i]], initial = cfg$initial))
		}

		# Build forecast call
		forecast_args <- list(test, h = datasets[[i]]$h)
		testForecast <- do.call(forecast, forecast_args)
		testForecast$timeElapsed <- Sys.time() - startTime

		return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x))
	}
	testResults[j,,] <- t(result)
}

Results

And here are the results for the smooth functions in v4.4.0 for R. First, we summarise the RMSSEs. I produce quartiles of distribution of RMSSE together with the mean.

cbind(t(apply(testResults[,,"RMSSE"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"RMSSE"],1,mean)) |> round(4)

                  0%    25%    50%    75%      100%   mean
ETS           0.0245 0.6772 1.1806 2.3765   51.6160 1.9697
Auto ARIMA    0.0246 0.6802 1.1790 2.3583   51.6160 1.9864
ADAM ETS Back 0.0183 0.6647 1.1620 2.3023   50.2585 1.9283
ADAM ETS Opt  0.0242 0.6714 1.1868 2.3623   51.6160 1.9432
ADAM ETS Two  0.0246 0.6690 1.1875 2.3374   51.6160 1.9480
ES Back       0.0183 0.6674 1.1647 2.3164   50.2585 1.9292
ES Opt        0.0242 0.6740 1.1858 2.3644   51.6160 1.9469
ES Two        0.0245 0.6717 1.1874 2.3463   51.6160 1.9538
ES XXX        0.0183 0.6777 1.1708 2.3062   50.2585 1.9613
ES ZZZ        0.0108 0.6682 1.1816 2.3611  201.4959 2.0841
ES FFF        0.0145 0.6795 1.2170 2.4575 5946.1858 3.3033
ES SXS        0.0183 0.6754 1.1709 2.3539   50.2585 1.9448
MSARIMA       0.0278 0.6988 1.1898 2.4208   51.6160 2.0750
SSARIMA       0.0277 0.7371 1.2544 2.4425   51.6160 2.0625
CES Back      0.0450 0.6761 1.1741 2.3205   51.0571 1.9650
GUM Back      0.0333 0.7077 1.2073 2.4533   51.6184 2.0461

The worst performing models are the ETS with the multiplicative trend (ES ZZZ and ES FFF). This is because there are outliers in some time series, and the multiplicative trend reacts to them by amending the trend value to something large (e.g. 2, i.e. twice increase in level for each step), and then can never return to a reasonable level (see explanation of this phenomenon in Section 6.6 of ADAM book). As expected, ADAM ETS does very similar to the ES, and we can see that the default initialisation (backcasting) is pretty good in terms of RMSSE values. To be fair, if the models are tested on a different dataset, it might be the case that the optimal initialisation would do better.

Here is a table with the SAME results:

cbind(t(apply(testResults[,,"SAME"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"SAME"],1,mean)) |> round(4)

                 0%    25%    50%    75%      100%   mean
ETS           8e-04 0.3757 1.0203 2.5097   54.6872 1.9983
Auto ARIMA    0e+00 0.3992 1.0429 2.4565   53.2710 2.0446
ADAM ETS Back 1e-04 0.3752 0.9965 2.4047   52.3418 1.9518
ADAM ETS Opt  5e-04 0.3733 1.0212 2.4848   55.1018 1.9618
ADAM ETS Two  8e-04 0.3780 1.0316 2.4511   55.1019 1.9712
ES Back       0e+00 0.3733 0.9945 2.4122   53.4504 1.9485
ES Opt        2e-04 0.3727 1.0255 2.4756   54.6860 1.9673
ES Two        1e-04 0.3855 1.0323 2.4535   54.6856 1.9799
ES XXX        1e-04 0.3733 1.0050 2.4257   53.1697 1.9927
ES ZZZ        3e-04 0.3824 1.0135 2.4885  229.7626 2.1376
ES FFF        3e-04 0.3972 1.0489 2.6042 3748.4268 2.9501
ES SXS        6e-04 0.3750 1.0125 2.4627   53.4504 1.9725
MSARIMA       1e-04 0.3960 1.0094 2.5409   54.7916 2.1227
SSARIMA       1e-04 0.4401 1.1222 2.5673   52.5023 2.1248
CES Back      6e-04 0.3767 1.0079 2.4085   54.9026 2.0052
GUM Back      0e+00 0.3803 1.0575 2.6259   63.0637 2.0858

In terms of bias, smooth implementations of ETS are doing well again, and we can see the same issue with the multiplicative trend here as before. Another thing to note is that MSARIMA and SSARIMA are not as good as the Auto ARIMA from the forecast package on these datasets in terms of RMSSE and SAME (at least, in terms of mean error measures). And actually, GUM and CES are now better than those in terms of both error measures.

Finally, here is a table with the computational time:

cbind(t(apply(testResults[,,"Time"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"Time"],1,mean)) |> round(4)

                  0%    25%    50%     75%    100%   mean
ETS           0.0032 0.0117 0.1660  0.6728  1.6400 0.3631
Auto ARIMA    0.0100 0.1184 0.3618  1.0548 54.3652 1.4760
ADAM ETS Back 0.0162 0.1062 0.1854  0.4022  2.5109 0.2950
ADAM ETS Opt  0.0319 0.1920 0.3103  0.6792  3.8933 0.5368
ADAM ETS Two  0.0427 0.2548 0.4035  0.8567  3.7178 0.6331
ES Back       0.0153 0.0896 0.1521  0.3335  2.1128 0.2476
ES Opt        0.0303 0.1667 0.2565  0.5910  3.5887 0.4522
ES Two        0.0483 0.2561 0.4016  0.8626  3.5892 0.6309
MSARIMA Back  0.0614 0.3418 0.6947  0.9868  3.9677 0.7534
SSARIMA Back  0.0292 0.2963 0.8988  2.1729 13.7635 1.6581
CES Back      0.0146 0.0400 0.1834  0.2298  1.2099 0.1713
GUM Back      0.0165 0.2101 1.5221  3.0543  9.5380 1.9506

# Separate table for special pools of ETS.
# The time is proportional to the number of models here
=========================================================
                  0%    25%    50%     75%    100%   mean
ES XXX        0.0114 0.0539 0.0782  0.1110  0.8163 0.0859
ES ZZZ        0.0147 0.1371 0.2690  0.4947  2.2049 0.3780
ES FFF        0.0529 0.2775 1.1539  1.5926  3.8552 1.1231
ES SXS        0.0323 0.1303 0.4491  0.6013  2.2170 0.4581

I have manually moved the specific ES model pools flavours below because there is no point in comparing their computational time with the time of the others (they have different pools of models and thus are not really comparable with the rest).

What we can see from this, is that the ES with backcasting is faster in comparison with the other models in this setting (in terms of mean and median computational time). CES is very fast in terms of mean computational time, which is probably because of the very short pool of models to choose from (only four). SSARIMA is pretty slow, which is due to the nature of its order selection algorithm (I don't plan to update it any time soon, but if someone wants to contribute - let me know). But the interesting thing is that Auto ARIMA, while being relatively fine in terms of median time, has the highest maximum one, meaning that for some time series, it failed for some unknown reason. The series that caused the biggest issue for Auto ARIMA is N389 from the M1 competition. I'm not sure what the issue was, and I don't have time to investigate this.

Mean computational time vs mean RMSSE

Comparing the mean computational time with mean RMSSE value (image above), it looks like the overall tendency in the smooth + forecast functions for the M1, M3 and Tourism datasets is that additional computational time does not improve the accuracy. But it also looks like a simpler pool of pure additive models (ETS(X,X,X)) harms the accuracy in comparison with the branch-and-bound based one of the default model="ZXZ". There seems to be a sweet spot in terms of the pool of models to choose from (no multiplicative trend, allow mixed models). This aligns well with the papers of Petropoulos et al. (2025), who investigated the accuracy of arbitrary short pools of models and Kourentzes et al. (2019), who showed how pooling (if done correctly) can improve the accuracy on average.

What's next?

For R, the main task now is to rewrite the oes() function and substitute it with the om() one - "Occurrence Model". This should be equivalent to adam() in functionality, allowing to introduce ETS, ARIMA and explanatory variables for the occurrence part of the model. This is a huge work, which I hope to progress slowly throughout the 2026 and finish by the end of the year. Doing that will also allow me removing the last bits of the old C++ code and switch to the ADAM core completely, introducing more functionality for capturing patterns on intermittent demand. The minor task, is to test the smoother="global" more for the ETS initialisation and roll it out as the default in the next release for both R and Python.

For Python,... What Python? Ah! You'll see soon :)

Message smooth v4.4.0 first appeared on Open Forecast.

smooth v4.3.0 in R: what’s new and what’s next?

Ivan Svetunkov — Fri, 04 Jul 2025 10:02:17 +0000

Good news! The smooth package v4.3.0 is now on CRAN. And there are several things worth mentioning, so I have written this post.

New default initialisation mechanism

Since the beginning of the package, the smooth functions supported three ways for initialising the state vector (the vector that includes level, trend, seasonal indices): optimisation, backcasting and values provided by user. The former has been considered the standard way of estimating ETS, while the backcasting was originally proposed by Box & Jenkins (1970) and was only implemented in the smooth (at least, I haven’t seen it anywhere else). The main advantage of the latter is in computational time, because you do not need to estimate every single value of the state vector. The new ADAM core that I developed during COVID lockdown, had some improvements for the backcasting, and I noticed that adam() produced more accurate forecasts with it than with the optimisation. But I needed more testing, so I have not changed anything back then.

However, my recent work with Kandrika Pritularga on capturing uncertainty in ETS, have demonstrated that backcasting solves some fundamental problems with the variance of states – the optimisation cannot handle so many parameters, and asymptotic properties of ETS do not make sense in that case (we’ll release the paper as soon as we finish the experiments). So, with this evidence on hands and additional tests, I have made a decision to switch from the optimisation to backcasting as the default initialisation mechanism for all the smooth functions.

The final users should not feel much difference, but it should work faster now and (hopefully) more accurately. If this is not the case, please get in touch or file an issue on github.

Also, rest assured the initial="optimal" is available and will stay available as an option in all the smooth functions, so, you can always switch back to it if you don’t like backcasting.

Finally, I have introduce a new initialisation mechanism called “two-stage”, the idea of which is to apply backcasting first and then to optimise the obtained state values. It is slower, but is supposed to be better than the standard optimisation.

ADAM core

Every single function in the smooth package now uses ADAM C++ core, and the old core will be discontinued starting from v4.5.0 of the package. This applies to the functions: es(), ssarima(), msarima(), ces(), gum(), sma(). There are now the legacy versions of these functions in the package with the prefix “_old” (e.g. es_old()), which will be removed in the smooth v4.5.0. The new engine also helped ssarima(), which now became slightly more accurate than before. Unfortunately, there are still some issues with the initialisation of the seasonal ssarima(), which I have failed to solve completely. But I hope that over time this will be resolved as well.

smooth performance update

I have applied all the smooth functions together with the ets() and auto.arima() from the forecast package to the M1, M3 and Tourism competition data and have measured their performances in terms of RMSSE, scaled Cumulative Error (sCE) and computational time. I used the following R code for that:

Long and boring code in R

library(Mcomp)
library(Tcomp)

library(forecast)
library(smooth)

# I work on Linux and use doMC. Substitute this with doParallel if you use Windows
library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
	holdout <- as.vector(holdout);
	insample <- as.vector(insample);
	return(c(measures(holdout, object$mean, insample),
			 mean(holdout < object$upper & holdout > object$lower),
			 mean(object$upper-object$lower)/mean(insample),
			 pinball(holdout, object$upper, 0.975)/mean(insample),
			 pinball(holdout, object$lower, 0.025)/mean(insample),
			 sMIS(holdout, object$lower, object$upper, mean(insample),0.95),
			 object$timeElapsed))
}

# Datasets to use
datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)
# Types of models to try
methodsNames <- c("ETS", "Auto ARIMA",
				  "ADAM ETS Back", "ADAM ETS Opt", "ADAM ETS Two",
				  "ES Back", "ES Opt", "ES Two",
				  "ADAM ARIMA Back", "ADAM ARIMA Opt", "ADAM ARIMA Two",
				  "MSARIMA Back", "MSARIMA Opt", "MSARIMA Two",
				  "SSARIMA Back", "SSARIMA Opt", "SSARIMA Two",
				  "CES Back", "CES Opt", "CES Two",
				  "GUM Back", "GUM Opt", "GUM Two");
methodsNumber <- length(methodsNames);
test <- adam(datasets[[125]]);

testResults20250603 <- array(NA,c(methodsNumber,datasetLength,length(test$accuracy)+6),
                             dimnames=list(methodsNames, NULL,
                                           c(names(test$accuracy),
                                             "Coverage","Range",
                                             "pinballUpper","pinballLower","sMIS",
                                             "Time")));

#### ETS from forecast package ####
j <- 1;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
  startTime <- Sys.time()
  test <- ets(datasets[[i]]$x);
  testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### AUTOARIMA ####
j <- 2;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.arima(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Backcasting ####
j <- 3;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Optimal ####
j <- 4;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Two-stage ####
j <- 5;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Backcasting ####
j <- 6;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Optimal ####
j <- 7;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Two-stage ####
j <- 8;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Backcasting ####
j <- 9;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="back", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Optimal ####
j <- 10;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="opt", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Two-stage ####
j <- 11;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="two", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Backcasting ####
j <- 12;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Optimal ####
j <- 13;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Two-stage ####
j <- 14;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Backcasting ####
j <- 15;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ssarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Optimal ####
j <- 16;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="opt");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Two-stage ####
j <- 17;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="two");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Backcasting ####
j <- 18;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Optimal ####
j <- 19;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Two-stage ####
j <- 20;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Backcasting ####
j <- 21;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Optimal ####
j <- 22;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Two-stage ####
j <- 23;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

# Summary of results
cbind(t(apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,quantile)),
mean=apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,mean),
sCE=apply(testResults20250603[c(1:8,12:23),,"sCE"],1,mean),
Time=apply(testResults20250603[c(1:8,12:23),,"Time"],1,mean)) |> round(3)

The table below shows the distribution of RMSSE, the mean sCE and mean Time. The boldface shows the best performing model.

                    min   Q1  median   Q3     max  mean    sCE  Time
ETS                0.024 0.677 1.181 2.376  51.616 1.970  0.299 0.385
Auto ARIMA         0.025 0.680 1.179 2.358  51.616 1.986  0.124 1.467

ADAM ETS Back      0.015 0.666 1.175 2.276  51.616 1.921  0.470 0.218
ADAM ETS Opt       0.020 0.666 1.190 2.311  51.616 1.937  0.299 0.432
ADAM ETS Two       0.025 0.666 1.179 2.330  51.616 1.951  0.330 0.579

ES Back            0.015 0.672 1.174 2.284  51.616 1.921  0.464 0.219
ES Opt             0.020 0.672 1.186 2.316  51.616 1.943  0.302 0.497
ES Two             0.024 0.668 1.181 2.346  51.616 1.952  0.346 0.562

MSARIMA Back       0.025 0.710 1.188 2.383  51.616 2.028  0.067 0.780
MSARIMA Opt        0.025 0.724 1.242 2.489  51.616 2.083  0.088 1.905
MSARIMA Two        0.025 0.718 1.250 2.485  51.906 2.075  0.083 2.431

SSARIMA Back       0.045 0.738 1.248 2.383  51.616 2.063  0.167 1.747
SSARIMA Opt        0.025 0.774 1.292 2.413  51.616 2.040  0.178 7.324
SSARIMA Two        0.025 0.742 1.241 2.414  51.616 2.027  0.183 8.096

CES Back           0.046 0.695 1.189 2.355  51.342 1.981  0.125 0.185
CES Opt            0.030 0.698 1.218 2.327  49.480 2.001 -0.135 0.834
CES Two            0.025 0.696 1.207 2.343  51.242 1.993 -0.078 1.006

GUM Back           0.046 0.707 1.215 2.399  51.134 2.049 -0.285 3.575
GUM Opt            0.026 0.795 1.381 2.717 240.143 2.932 -0.549 4.668
GUM Two            0.026 0.803 1.406 2.826 240.143 3.041 -0.593 4.703

Several notes:

ES is a wrapper of ADAM ETS. The main difference between them is that the latter uses the Gamma distribution for the multiplicative error models, while the former relies on the Normal one.
MSARIMA is a wrapper for ADAM ARIMA, which is why I don't report the latter in the results.

One thing you can notice from the output above, is that the models with backcasting consistently produce more accurate forecasts across all measures. I explain this with the idea that they tend not to overfit the data as much as the optimal initialisation does.

To see the stochastic dominance of the forecasting models, I conducted the modification of the MCB/Nemenyi test, explained in this post:

par(mar=c(10,3,4,1))
greybox::rmcb(t(testResults20250603[c(1:8,12:23),,"RMSSE"]), outplot="mcb")

Nemenyi test for the smooth functions

The image shows mean ranks for each of the models and whether the performance of those is significant on the 5% level or not. It is apparent that ADAM ETS has the lowest rank, no matter what the initialisation is used, but its performance does not differ significantly from the es(), ets() and auto.arima(). Also, auto.arima() significantly outperforms msarima() and ssarima() on this data, which could be due to their initialisation. Still, backcasting seems to help all the functions in terms of accuracy in comparison with the "optimal" and "two-stage" initials.

What's next?

I am now working on a modified formulation for ETS, which should fix some issues with the multiplicative trend and make the ETS safer. This is based on Section 6.6 of the online version of the ADAM monograph (it is not in the printed version). I am not sure whether this will improve the accuracy further, but I hope that it will make some of the ETS models more resilient than they are right now. I specifically need the multiplicative trend model, which sometimes behave like crazy due to its formulation.

I also plan to translate all the simulation functions to the ADAM core. This applies to sim.es(), sim.ssarima(), sim.gum() and sim.ces(). Currently they rely on the older one, and I want to get rid of it. Having said that, the method simulate() applied to the new smooth functions already uses the new core. It just lacks the flexibility that the other functions have.

Furthermore, I want to rewrite the oes() function and substitute it with oadam(), which would use a better engine, supporting more features, such as multiple frequencies and ARIMA for the occurrence. This is a lot of work, and I probably will need help with that.

Finally, Filotas Theodosiou, Leonidas Tsaprounis, and I are working on the translation of the R code of the smooth to Python. You can read a bit more about this project here. There are several other people who decided to help us, but the progress so far has been a bit slow, because of the code translation. If you want to help, please get in touch.

Message smooth v4.3.0 in R: what’s new and what’s next? first appeared on Open Forecast.

What does “lower error measure” really mean?

Ivan Svetunkov — Wed, 27 Mar 2024 18:29:03 +0000

“My amazing forecasting method has a lower MASE than any other method!” You’ve probably seen claims like this on social media or in papers. But have you ever thought about what it really means?

Many forecasting experiments come to applying several approaches to a dataset, calculating error measures for each method per time series and aggregating them to get a neat table like this one (based on the M and tourism competitions, with R code similar to the one from this post):

           RMSSE    sCE
ADAM ETS   1.947  0.319
ETS        1.970  0.299
ARIMA      1.986  0.125
CES        1.960 -0.011

Typically, the conclusion drawn from such tables is that the approach with the measure closest to zero performs the best, on average. I’ve done this myself many times because it’s a simple way to present results. So, what’s the issue?

Well, almost any error measure has a skewed distribution because it cannot be lower than zero, and its highest value is infinity (this doesn’t apply to bias). Let me show you:

Distribution of RMSSE for ADAM ETS on M and tourism competitions data

This figure shows the distribution of 5,315 RMSSE values for ADAM ETS. As seen from the violin plot, the distribution has a peak close to zero and a very long tail. This suggests that the model performed well on many series but generated inaccurate forecasts for just a few, or perhaps only one of them. The mean RMSSE is 1.947 (vertical red line on the plot). However, this single value alone does not provide full information about the model’s performance. Firstly, it tries to represent the entire distribution with just one number. Secondly, as we know from statistics, the mean is influenced by outliers: if your method performed exceptionally well on 99% of cases but poorly on just 1%, the mean will be higher than in the case of a poorly performing method that did not do that badly on 1%.

So, what should we do?

At the very least, provide both mean and median error measures: the distance between them shows how skewed the distribution is. But an even better approach would be to report mean and several quantiles of error measures (not just median). For instance, we could present the 1st, 2nd (median), and 3rd quartiles together with mean, min and max to offer a clearer understanding of the spread and variability of the error measure:

             min    1Q  median    3Q     max  mean
ADAM ETS   0.024 0.670   1.180 2.340  51.616 1.947
ETS        0.024 0.677   1.181 2.376  51.616 1.970
ARIMA      0.025 0.681   1.179 2.358  51.616 1.986
CES        0.045 0.675   1.171 2.330  51.201 1.960

This table provides better insights: ETS models consistently perform well in terms of mean, min, and first quartile RMSSE, while ARIMA outperformed them in terms of median. CES did better than the others in terms of median, third quartile, and the maximum. This means that there were some time series, where ETS struggled a bit more than CES, but in the majority of cases it performed well.

So, next time you see a table with error measures, keep in mind that the best method on average might not be the best consistently. Having more details helps in understanding the situation better.

You can read more about error measures for forecasting in the “forecast evaluation” category.

Message What does “lower error measure” really mean? first appeared on Open Forecast.

The Long and Winding Road: The Story of Complex Exponential Smoothing

Ivan Svetunkov — Tue, 02 Aug 2022 12:26:53 +0000

About the paper.

Disclaimer

The idea of using complex variables in modelling and forecasting was originally proposed by my father, Sergey Svetunkov. Based on that, we developed several models, which were then used in some of our research. We worked together in this direction and published several articles in Russian. My father even published a monograph “Complex-Valued Modeling in Economics and Finance” based on that research.

Pre-PhD period

This story started in 2010 when I worked as an Associate Professor at the Higher School of Economics (HSE) in Saint Petersburg, Russia. By then, I had defended my candidate thesis (in Russia, this is considered an equivalent to a PhD) on the topic of “Complex Variables Production Functions”, and I was teaching Microeconomics, Econometrics and Forecasting to undergraduate students. On my way to work (which would typically take an hour), I would typically read or write something. On one of those days, I came up with the basic formula for Complex Exponential Smoothing, assigning the error term to the imaginary part of the number and using Brown’s Simple Exponential Smoothing as a basis for the new forecasting method. Just for comparison, here is the Simple Exponential Smoothing:
\begin{equation*}
\hat{y}_{t+1} = \alpha y_t + (1-\alpha) \hat{y}_{t} .
\end{equation*}
And here is what I came up with:
\begin{equation*}
\hat{y}_{t+1} + i \hat{\varsigma}_{t+1} = (\alpha_0 + i \alpha_1) (y_t + i \varsigma_t) + (1-\alpha_0 + i – i \alpha_1) (\hat{y}_{t} + i \hat{\varsigma}_{t}) .
\end{equation*}
I’m not explaining this formula in this post (you can read about it here). It is here just for demonstration. It was and still is a complicated forecasting method to understand, but the idea itself excited me. When I returned home, I continued the derivations and did some basic experiments in Excel. I developed the method further in 2010 and presented it in April 2011 at a conference on Business Informatics in Kharkiv, Ukraine (this is one of the cities that Russian army has been bombing in the war that Putin started with Ukraine on 24th February 2022). The idea was well received, and I had encouraging feedback. The first paper on CES was then published in Russian language in the proceedings of the conference (it is available in Russian here and here, p.11 – I used to call the method “Complex Exponentially Weighted Moving Average”, CEWMA back then).

After that, I started thinking of preparing a paper in English and submitting it to an international peer-reviewed journal. HSE had an excellent service, where people outside your department would read your paper and provide feedback. So I used that service after preparing the first draft in English in 2012 and got a review with several comments. One of them was helpful. It said that my paper lacked proper motivation and that, in its current state, it could not be published in a peer-reviewed international journal. However, the other comment was that my research area was uninteresting, nobody did anything like that in the academic world, and thus I should find a different area of research.

I disagreed with the latter point and, after minor modifications, submitted the paper to the International Journal of Forecasting (IJF). As expected, Rob Hyndman (back then, editor-in-chief of the journal) replied that the paper could not be published because it lacked motivation and because I failed to show that the approach worked. At that time, I did not know how to motivate the paper or how to modify it to make it publishable, so that was a dead end for that version of the paper. But I did not want to give up, so in 2012, I applied for a PhD in Management Science at Lancaster University, writing a proposal about my model.

PhD period

I was admitted as a PhD student in 2013 with a scholarship from the Lancaster University Management School, and I started my work under the supervision of Nikolaos Kourentzes and Robert Fildes on the topic “Complex Exponential Smoothing”. After preparing a proper experiment, I received good results and wrote the first version of the R function ces(). The results of this work were presented in my first International Symposium on Forecasting (ISF) in Rotterdam in 2014. Nobody noticed my presentation, and nobody seemed to care.

I then focused on rewriting the paper, Nikos helped me in writing up the motivation. After collecting feedback about the paper from our colleagues, we decided to submit it to a statistical journal. That was very arrogant of us – we did not understand how to write papers for such journals, and nobody in our group ever published there. As a result, we got a desk rejection from the Journal of American Statistical Association in 2015, saying that they do not publish forecasting papers.

In parallel, I started working on an extension of the CES for the seasonal time series, which I then presented at ISF2015 at Riverside, US. I then managed to discuss my research with Keith Ord, who expressed his interest in it and provided support and guidance for some parts of it. He even helped me with some derivations, which I included in the first paper.

To make things even more complicated, I continued work on my PhD and wrote a second paper, extending CES for seasonal time series. At the end of 2015, I resubmitted the first paper to Operations Research journal, where it got desk-rejected, and then to EJOR (European Journal of Operational Research). After a short discussion with Nikos, we decided to submit the second paper to IJF, hoping that the first will progress fast and that the two of them can be done in parallel. That was a fatal mistake, which impacted my academic career and mental well-being for the next several years.

Unfortunately, the first paper got rejected from EJOR after the second round of revision, with a second reviewer saying that it could not be published because we did not use the Diebold-Mariano test (yes, that was the reason. Note: we used Nemenyi instead). As for the second one, it got stuck in IJF. In the first round, the second reviewer said that the model has a fatal flaw and cannot be used in practice (he concluded that because he misunderstood how the model worked). In the second round, when we explained the model in more detail, the reviewer looked more carefully at CES and started criticising the first paper, which by then was published as a working paper. We placed ourselves in a challenging situation: we had to defend the first paper in the revision of the second one. This process led us to the third and then to the fourth round without significant progress. We were discussing the meaning of complex variables in the model and whether the imaginary part of the model makes sense instead of discussing the seasonal extension of CES. It was apparent that the model works (it performed better than ETS and ARIMA on the M competition data), but the reviewers had questions about the interpretation of the original model. In the fourth round, an Associate Editor of IJF has written that “I still maintain view and so does reviewer 2 that there is an interesting paper lurking under this paper but we are yet to see it and evaluate it on its own merits“. It became clear that we were not moving forward and that the only way out of this dead end would be to merge the two papers and restart the submission process – by then, we were discussing a completely different paper than the one submitted initially to IJF. I was not ready for this serious step, and I decided not to continue the revision process in IJF and put the paper on hold. By then, my publishing experience had been very disappointing and demotivating, and I struggled to continue doing anything in that research direction. Whenever I would open the paper, it would spoil my mood for the rest of the day, as I would think that it was unpublishable and that nobody needed my work (as I’ve been told repeatedly by many different people starting from 2010).

Nonetheless, somewhere in the middle of the IJF revision, at the end of 2016, I had my viva. I got PhD in Management Science defending the thesis on the topic “Complex Exponential Smoothing”.

Post-PhD period

At the end of 2017, Fotios Petropoulos suggested me to participate in the M4 competition. His idea was to submit a combination of forecasts from several models: ETS, ARIMA, Theta and CES. After trying out several options, we used median for the combination (I must confess that we weren’t the first ones that did that, this was investigated, for example, by Jose & Winkler, 2008). This approach got to 6th place in the competition. We were invited to submit a paper explaining our approach, which was then published in IJF (Petropoulos & Svetunkov, 2020). That paper is the first paper published in a peer-reviewed journal discussing CES.

In 2018, during the ISF in Boulder, Nikos and I invited Keith Ord to join our paper – he supported me during my PhD and made a substantial contribution to the paper. We decided to clean the paper up, rewrite some parts, and submit it to a peer-reviewed journal as a paper from three co-authors. It took us some time to return to the original text, revive the R code and update the paper. In the middle of 2019, Nikos, Keith and I submitted the CES paper to the Journal of Time Series Analysis. It was a desk rejection with a comment that the Associate Editor “…argues that your paper is a relatively straightforward extension of smoothing via a state space model” and thus the paper “is not appropriate for publication in this journal in terms of substantive content“. We rewrote the motivation to align the paper with an OR-related journal and submitted it to Omega, to get another desk rejection saying that it is too mathematical for them and that the paper “is quite technical and would likely be best served by targeting a journal in the time series or forecasting field instead“.

Finally, at the end of 2019, we submitted the paper to Naval Research Logistics (NRL). By then, I did not have any expectations about the paper and was sure that it would either be a desk rejection or a rejection from reviewers – I had seen this outcome so many times that it would be naive to expect anything else to happen. However, this time we got an Associate Editor who liked the idea and supported us from the first revision. In fact, they pointed out that CES has already been used in M4 competition and showed that it brought value. On 24th February 2021, we got our first round of revision, after which I decided to move some parts of paper 2 (seasonal CES) to the first one, merging the two. It made sense because the paper would now look complete. While one of the reviewers was sceptical about the paper, Associate Editor provided colossal support and guided us in what to change in the paper so that it could be accepted in NRL. After two rounds and some additional rewrites of the paper, on 18th June 2022, it was accepted for publication in Naval Research Logistics, and then published online on 2nd August 2022.

Conclusions

Complex Exponential Smoothing is a complex idea, something that people are not used to. It stands out and does things differently, not the way the researchers typically do. This is what makes it interesting, and this is what made it extremely difficult to publish. Over the years, I questioned the correctness and usefulness of my idea many times. Some days I would be dancing around, singing “it works, it works” after a successful experiment; on others, I would throw it away, saying “never again” when the experiments failed. This is all part of academic life. However, the most challenging experience for me was the publication of the paper. Over the years, I have met a lot of resistance from the academic world.

I have not included here comments from my former Higher School of Economics colleagues or comments from some journal reviewers. They rarely were pleasant and supportive. Some people did not understand the idea, the others did not want to understand it. But there were always several people around me who helped and guided me. I would not be able to publish the paper in the end if it was not for the support from Nikos Kourentzes, Keith Ord, Sergey Svetunkov (my father) and Anna Sroginis (my wife). They believed in the idea and supported me even when it looked that it wouldn’t work. So, I am immensely grateful for their support. It has been a long and winding road… and I’m glad that it’s finally over.

As for the lessons to learn from this, I have several for you:

Do not try publishing dependent papers in parallel: if your second paper depends on the first one, do not submit it before the first one is at least accepted.
If you want to publish in a journal in which your group does not typically publish, find a person who does and work with them. That became apparent to me when I worked on a different paper with a colleague from a statistics department. Statistical journals have a completely different style than the OR ones, and we had no chance to publish CES paper there.
As a reviewer, you might not understand the paper you are reviewing. This is okay. We cannot know and understand everything instantaneously. But that does not mean that the paper is not good. It only means that you need to invest more time in understanding the paper and then help to improve it (yes, paper revision is a serious job, not a box-ticking process). I had many comments of the style “I did not understand it, so reject”. This is not how revisions should be done.

Last but not least, be critical of your ideas, but if you believe in something, stick with it and be patient. It might take a lot of time for other people to start appreciating what you have been trying to show them.

Message The Long and Winding Road: The Story of Complex Exponential Smoothing first appeared on Open Forecast.

Complex Exponential Smoothing

Ivan Svetunkov — Tue, 02 Aug 2022 12:23:39 +0000

Authors: Ivan Svetunkov, Nikolaos Kourentzes, Keith Ord.

Journal: Naval Research Logistics

Abstract: Exponential smoothing has been one of the most popular forecasting methods used to support various decisions in organisations, in activities such as inventory management, scheduling, revenue management and other areas. Although its relative simplicity and transparency have made it very attractive for research and practice, identifying the underlying trend remains challenging with significant impact on the resulting accuracy. This has resulted in the development of various modifications of trend models, introducing a model selection problem. With the aim of addressing this problem, we propose the Complex Exponential Smoothing (CES), based on the theory of functions of complex variables. The basic CES approach involves only two parameters and does not require a model selection procedure. Despite these simplifications, CES proves to be competitive with, or even superior to existing methods. We show that CES has several advantages over conventional exponential smoothing models: it can model and forecast both stationary and non-stationary processes, and CES can capture both level and trend cases, as defined in the conventional exponential smoothing classification. CES is evaluated on several forecasting competition datasets, demonstrating better performance than established benchmarks. We conclude that CES has desirable features for time series modelling and opens new promising avenues for research.

Working paper

DOI: 10.1002/nav.22074

The story of the paper.

The idea of Complex Exponential Smoothing

One of the most fundamental ideas in forecasting is the decomposition of time series into several unobservable components (see, for example, Section 3.1 of ADAM monograph), typically: level, trend, seasonality, error. ETS relies on this idea of decomposition and implements the selection of components via information criteria. However, not all time series have these components and the split itself is arbitrary, because, for example, in practice time series with slow trend might be indistinguishable from the series with rapidly changing level. Furthermore, in reality, the data can be more complicated – it might not have distinct level and trend, and instead can represent a non-linear mixture of unobservable components.

Complex Exponential Smoothing models non-linearity in time series and captures a structure in a different way. Here is how the conventional CES method is formulated:

\begin{equation} \label{eq:cesalgebraic}
\hat{y}_{t} + i \hat{e}_{t} = (\alpha_0 + i\alpha_1)(y_{t-1} + i e_{t-1}) + (1 – \alpha_0 + i – i\alpha_1)(\hat{y}_{t-1} + i \hat{e}_{t-1}) ,
\end{equation}
where \(y_t\) is the actual value, \(e_t\) is the forecast error, \(\hat{y}_t\) is the predicted value, \(\hat{e}_t\) is proxy for the error term, \(\alpha_0\) and \(\alpha_1\) are the smoothing parameters and \(i\) is the imaginary unit, satisfying the equation \(i^2=-1\). Due to the usage of complex variables, the method allows distributing weights between the observations over time in a non-linear way. This becomes more apparent if we insert the same formula \eqref{eq:cesalgebraic} in the right hand side of \eqref{eq:cesalgebraic} and do that several times to get a recursion (similar how it is typically done for Simple Exponential Smoothing. See for, example, Subsection 3.4.2 of ADAM monograph):
\begin{equation} \label{eq:cesalgebraicExpanded}
\begin{aligned}
\hat{y}_{t} + i \hat{e}_{t} = & (\alpha_0 + i\alpha_1)(y_{t-1} + i e_{t-1}) + \\
& (\alpha_0 + i\alpha_1) (1 – \alpha_0 + i – i\alpha_1) (y_{t-2} + i e_{t-2}) + \\
& (\alpha_0 + i\alpha_1) (1 – \alpha_0 + i – i\alpha_1)^2 (y_{t-3} + i e_{t-3}) + \\
& … + \\
& (\alpha_0 + i\alpha_1) (1 – \alpha_0 + i – i\alpha_1)^{t-2} (y_{1} + i e_{1}) + \\
& (1 – \alpha_0 + i – i\alpha_1)^{t-1} (\hat{y}_{1} + i \hat{e}_{1}) .
\end{aligned}
\end{equation}
This exponentiation of \((1 – \alpha_0 + i – i\alpha_1)\) in the formula above is what distributes the weights over time in a non-linear fashion. All of this is difficult to understand, so here is a beautiful figure showing how the weights can be distributed over time (blue line – weights for the actual value, green one – weights for the forecast errors):

Distribution of weights between observations on complex and real plains. Blue line – weight for actual values, green line – weights for the errors.

Depending on values of the complex smoothing parameter \(\alpha_0 + i\alpha_1\), the distribution of weights will have different shape. It does not need to be harmonic as on the plot above, it can also be classical exponential (as in Simple Exponential Smoothing), which is achieved, when \(\alpha_1\) is close to one. This is what gives CES its flexibility and allows it deal with both stationary and non-stationary time series, without a need of switching between time series components.

The published paper also discusses a seasonal modification of CES model, which introduces seasonal component that can act either as additive, or multiplicative, or something in-between the two. I do not provide the formula here, because it is cumbersome.

Examples in R

In R, CES is implemented in ces() of smooth package. There is also auto.ces() function which does selection between seasonal and non-seasonal models using information criteria. The syntax of the function is similar to the one of es() and adam(). Here is an example of its application:

cesModel <- smooth::auto.ces(BJsales, holdout=TRUE, h=12)
cesModel

Time elapsed: 0.05 seconds
Model estimated: CES(n)
a0 + ia1: 1.9981+1.0034i
Initial values were produced using backcasting.

Loss function type: likelihood; Loss function value: 249.4613
Error standard deviation: 1.4914
Sample size: 138
Number of estimated parameters: 3
Number of degrees of freedom: 135
Information criteria:
     AIC     AICc      BIC     BICc 
504.9227 505.1018 513.7045 514.1457 

Forecast errors:
MPE: 0%; sCE: 0.7%; Asymmetry: -5%; MAPE: 0.4%
MASE: 0.857; sMAE: 0.4%; sMSE: 0%; rMAE: 0.329; rRMSE: 0.338

The output above has been discussed on this website in the context of es() in this post. The main difference is in the reported parameter. We see that \(\alpha_0 + i\alpha_1 = 1.9981 + i 1.0034\). The estimated model can then be used in forecasting, for example, using the command:

cesModel |> forecast(h=12, interval="p") |> plot()

to get:

CES forecast on the Box-Jenkins sales data

This function hasn't changed since I finished my PhD in 2016, so the results in terms of its accuracy discussed in this post still hold. It does not perform stellar, but as Petropoulos & Svetunkov (2020) showed, it brings value in combination of models. This is because CES captures well the long term tendencies in time series.

Last but not least, I plan to update the code of ces() as a part of the move to more efficient C++ routine in the smooth package in v3.2.0. So, its performance will change slightly but probably will not change much.

Acknowledgments

As a final word, I am immensely grateful to Nikolaos Kourentzes, who believed in CES back in 2012 and supported me throughout these years, during my PhD and after it without hesitation. I am also grateful to Keith Ord who helped in improving the paper and making it happen in the end. Finally, I am grateful to my father, Sergey Svetunkov, who provided me guidance in my first steps in academia and believed in my research, when it wasn't even fashionable.

If you want to know more about CES, read the paper (you can do it here as well) or read the story of the paper.

Message Complex Exponential Smoothing first appeared on Open Forecast.

A simple combination of univariate models

Ivan Svetunkov — Thu, 18 Apr 2019 08:39:17 +0000

Fotios Petropoulos and I have participated last year in M4 competition. Our approach performed well, finishing as 6th in the competition. This paper in International Journal of Forecasting explains what we used in our approach and why. Here’s the abstract:

This paper describes the approach that we implemented for producing the point forecasts and prediction intervals for our M4-competition submission. The proposed simple combination of univariate models (SCUM) is a median combination of the point forecasts and prediction intervals of four models, namely exponential smoothing, complex exponential smoothing, automatic autoregressive integrated moving average and dynamic optimised theta. Our submission performed very well in the M4-competition, being ranked 6th for the point forecasts (with a small difference compared to the 2nd submission) and prediction intervals and 2nd and 3rd for the point forecasts of the weekly and quarterly data respectively.

Paper in IJF.
Postprint of the paper.

Message A simple combination of univariate models first appeared on Open Forecast.

“smooth” package for R. Common ground. Part I. Prediction intervals

Ivan Svetunkov — Sun, 11 Jun 2017 13:23:40 +0000

UPDATE: Starting from v2.5.1 the parameter intervals has been renamed into interval for the consistency purposes with the other R functions.

We have spent previous six posts discussing basics of es() function (underlying models and their implementation). Now it is time to move forward. Starting from this post we will discuss common parameters, shared by all the forecasting functions implemented in smooth. This means that the topics that we discuss are not only applicable to es(), but also to ssarima(), ces(), gum() and sma(). However, taking that we have only discussed ETS so far, we will use es() in our examples for now.

And I would like to start this series of general posts from the topic of prediction intervals.

Prediction intervals for smooth functions

One of the features of smooth functions is their ability to produce different types of prediction intervals. Parametric prediction intervals (triggered by interval="p", interval="parametric" or interval=TRUE) are derived analytically only for pure additive and pure multiplicative models and are based on the state-space model discussed in previous posts. In the current smooth version (v2.0.0) only es() function has multiplicative components, all the other functions are based on additive models. This makes es() “special”. While constructing intervals for pure models (either additive or multiplicative) is relatively easy to do, the mixed models cause pain in the arse (one of the reasons why I don’t like them). So in case of mixed ETS models, we have to use several tricks.

If the model has multiplicative error, non-multiplicative other components (trend, seasonality) and low variance of the error (smaller than 0.1), then the intervals can be approximated by similar models with additive error term. For example, the intervals for ETS(M,A,N) can be approximated with intervals of ETS(A,A,N), when the variance is low, because the distribution of errors in both models will be similar. In all the other cases we use simulations for prediction intervals construction (via sim.es() function). In this case the data is generated with preset parameters (including variance) and contains \(h\) observations. This process is repeated 10,000 times, resulting in 10,000 possible trajectories. After that the necessary quantiles of these trajectories for each step ahead are taken using quantile() function from stats package and returned as prediction intervals. This cannot be considered as a pure parametric approach, but it is the closest we have.

smooth functions also introduce semiparametric and nonparametric prediction intervals. Both of them are based on multiple steps ahead (also sometimes called as “trace”) forecast errors. These are obtained via producing forecasts for horizon 1 to \(h\) from each observation of time series. As a result a matrix with \(h\) columns and \(T-h\) rows is produced. In case of semi-parametric intervals (called using interval="sp" or interval="semiparametric"), variances of forecast errors for each horizon are calculated and then used in order to extract quantiles of either normal or log-normal distribution (depending on error type). This way we cover possible violation of assumptions of homoscedasticity and no autocorrelation in residuals, but we still assume that each separate observation has some parametric distribution.

In case of non-parametric prediction intervals (defined in R via interval="np" or interval="nonparametric"), we loosen assumptions further, dropping part about distribution of residuals. In this case quantile regressions are used as proposed by Taylor and Bunn, 1999. However we use a different form of regression model than the authors do:
\begin{equation} \label{eq:ssTaylorPIs}
\hat{e}_{j} = a_0 j ^ {a_{1}},
\end{equation}
where \(j = 1, .., h\) is forecast horizon. This function has an important advantage over the proposed by the authors second order polynomial: it does not have extremum (turning point) for \(j>0\), which means that the intervals won’t behave strangely after several observations ahead. Using polynomials for intervals sometimes leads to weird bounds (for example, expanding and then shrinking). On the other hand, power function allows producing wide variety of forecast trajectories, which correspond to differently increasing or decreasing bounds of prediction intervals (depending on values of \(a_0\) and \(a_1\)), without producing any ridiculous trajectories.

The main problem with nonparametric intervals produced by smooth is caused by quantile regressions, which do not behave well on small samples. In order to produce correct 0.95 quantile, we need to have at least 20 observations, and if we want 0.99 quantile, then the sample must contain at least 100. In the cases, when there is not enough observations, the produced intervals can be inaccurate and may not correspond to the nominal level values.

As a small note, if a user produces only one-step-ahead forecast, then semiparametric interval will correspond to parametric one (because only the variance of the one-step-ahead error is used), and the nonparametric interval is constructed using quantile() function from stats package.

Finally, the width of prediction intervals is regulated by parameter level, which can be written either as a fraction number (level=0.95) or as an integer number, less than 100 (level=95). I personally prefer former, but the latter is needed for the consistency with forecast package functions. By default all the smooth functions produce 95% prediction intervals.

There are some other features of prediction interval construction for specific intermittent models and cumulative forecasts, but they will be covered in upcoming posts.

Examples in R

We will use a time series N1241 as an example and we will estimate model ETS(A,Ad,N). Here’s how we do that:

ourModel1 <- es(M3$N1241$x, "AAdN", h=8, holdout=TRUE, interval="p")
ourModel2 <- es(M3$N1241$x, "AAdN", h=8, holdout=TRUE, interval="sp")
ourModel3 <- es(M3$N1241$x, "AAdN", h=8, holdout=TRUE, interval="np")

The resulting graphs demonstrate some differences in prediction intervals widths and shapes:

Series N1241 from M3, es() forecast, parametric prediction intervals

Series N1241 from M3, es() forecast, semiparametric prediction intervals

Series N1241 from M3, es() forecast, nonparametric prediction intervals

All of them cover actual values in the holdout, because the intervals are very wide. It is not obvious, which of them is the most appropriate for this task. So we can calculate the spread of intervals and see, which of them is on average wider:

mean(ourModel1$upper-ourModel1$lower)
mean(ourModel2$upper-ourModel2$lower)
mean(ourModel3$upper-ourModel3$lower)

Which results in:

950.4171
955.0831
850.614

In this specific example, the non-parametric interval appeared to be the narrowest, which is good, taking that it adequately covered values in the holdout sample. However, this doesn't mean that it is in general superior to the other methods. Selection of the appropriate intervals should be done based on the general understanding of the violated assumptions. If we didn't know the actual values in the holdout sample, then we could make a decision based on the analysis of the in-sample residuals in order to get a clue about the violation of any assumptions. This can be done, for example, this way:

forecast::tsdisplay(ourModel1$residuals)

hist(ourModel1$residuals)

qqnorm(ourModel3$residuals)
qqline(ourModel3$residuals)

Linear plot and correlation functions of the residuals of the ETS(A,Ad,N) model

Histogram of the residuals of the ETS(A,Ad,N) model

Q-Q plot of the residuals of the ETS(A,Ad,N) model

The first plot shows how residuals change over time and how the autocorrelation and partial autocorrelation functions look for this time series. There is no obvious autocorrelation and no obvious heteroscedasticity in the residuals. This means that we can assume that these conditions are not violated in the model, so there is no need to use semiparametric prediction intervals. However, the second and the third graphs demonstrate that the residuals are not normally distributed (as assumed by the model ETS(A,Ad,N)). This means that parametric prediction intervals may be wrong for this time series. All of this motivates the usage of nonparametric prediction intervals for the series N1241.

That's it for today.

Message “smooth” package for R. Common ground. Part I. Prediction intervals first appeared on Open Forecast.

Complex Exponential Smoothing (Working paper)

Ivan Svetunkov — Mon, 01 Feb 2016 14:21:29 +0000

Some time ago I have published the working paper on Complex Exponential Smoothing on ResearchGate website. This is the paper written by Nikolaos Kourentzes and I in 2015. It explains a new approach in time series modelling and in forecasting, based on a notion of “information potential”. The model, resulting from this idea, allows to effectively forecast both trend and level time series (and it does it better than the conventional ETS). This paper is currently in the reviewing process, but it has already been read by 43 scientists on ResearchGate.

Message Complex Exponential Smoothing (Working paper) first appeared on Open Forecast.

Presentation on Management Science seminar at Lancaster University

Ivan Svetunkov — Wed, 18 Mar 2015 15:10:30 +0000

Today I have given a presentation on the topic of Complex Exponential Smoothing on Management Science seminar at Lancaster University. Lecturers and PhD students of the department attended the presentation and seemed to like it. However, only in the morning of the day I realised that I have prepared a wrong presentation (it should have been on the topic of seasonal model) and forgot to update file with the presentation in Dropbox. As a result I used a white board in order to explain some aspects of the proposed forecasting approach. Still, it seems that no one has noticed that something was wrong… and only one person was sleeping during the presentation, which I consider as a personal achievement.

Here’re the slides of the presentation.

Message Presentation on Management Science seminar at Lancaster University first appeared on Open Forecast.