Archives ARIMA - Open Forecasting

smooth v4.4.0

Ivan Svetunkov — Mon, 09 Feb 2026 09:02:21 +0000

Great news, everyone! smooth package for R version 4.4.0 is now on CRAN. Why is this a great news? Let me explain!

On this page:

What’s new?
Evaluation

Setup
Results

What’s next?

Here is what’s new since 4.3.0:

First, I have worked on tuning the initialisation in adam() in case of backcasting, and improved the msdecompose() function a bit to get more robust results. This was necessary to make sure that when the smoothing parameters are close to zero, initial values would still make sense. This is already in adam (use smoother="global" to test), but will become the default behaviour in the next version of the package, when we iron everything out. This is all a part of a larger work with Kandrika Pritularga on a paper about the initialisation of dynamic models.

Second, I have fixed a long standing issue of the eigenvalues calculation inside the dynamic models, which is applicable only in case of bounds="admissible" and might impact ARIMA, CES and GUM. The parameter restriction are now done consistently across all functions, guaranteeing that they will not fail and will produce stable/invertible estimates of parameters.

Third, I have added the Sparse ARMA function, which constructs ARMA(p,q) of the specific orders, dropping all the elements from 1 to those. e.g. SpARMA(2,3) would have the following form:
\begin{equation*}
y_t = \phi_2 y_{t-2} + \theta_3 \epsilon_{t-3} + \epsilon_{t}
\end{equation*}
This weird model is needed for a project I am working on together with Devon Barrow, Nikos Kourentzes and Yves Sagaert. I’ll explain more when we get the final draft of the paper.

And something very important, which you will not notice: I refactored the C++ code in the package so that it is available not only for R, but also for Python… Why? I’ll explain in the next post :). But this also means that the old functions that relied on the previous generation of the C++ code are now discontinued, and all the smooth functions use the new core. This applies to es(), ssarima(), msarima(), ces(), gum() and sma(). You will not notice any change, except that some of them should become a bit faster and probably more robust. And this also means that all of them will now be able to use methods for the adam() function. For example, the summary() will produce the proper output with standard errors and confidence intervals for all estimated parameters.

Evaluation

DISCLAIMER: The previous evaluation was for smooth v4.3.0, you can find it here. I have changed one of error measures (sCE to SAME), but the rest is the same, so the results are widely comparable between the versions.

The setup

As usual, in situations like this, I have run the evaluation on the M1, M3 and Tourism competition data. This time, I have added more flavours of the ETS model selection so that you can see how the models pool impacts the forecasting accuracy. Short description:

XXX – select between pure additive ETS models only;
ZZZ – select from the pool of all 30 models, but use branch-and-bound to kick out the less suitable models;
ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
FFF – select from the pool of all 30 models (exhaustive search);
SXS – the pool of models that is used by default in ets() from the forecast package in R.

I also tested three types of the ETS initialisation:

Back – initial="backcasting"
Opt – initial="optimal"
Two – initial="two-stage"

Backcasting is now the default method of initialisation, and does well in many cases, but I found that optimal initials (if done correctly) help in some difficult situations, as long a you have enough of computational time.

I used two error measures and computational time to check how functions work. The first error measure is called RMSSE (Root Mean Squared Scaled Error) from M5 competition, motivated by Athanasopoulos & Kourentzes (2023):

\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.

The second measure does not have a standard name in the literature, but the idea of it is to the measure the bias of forecasts and to get rid of the sign to make sure that positively biased forecasts on some time series are not cancelled out by the negative ones on the other ones. I call this measure “Scaled Absolute Mean Error” (SAME):

\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).

For both of these measures, the lower value is better than the higher one. As for the computational time, I have measured it for each model and each series, and this time I provided distribution of times to better see how methods perform.

Boring code in R

library(Mcomp)
library(Tcomp)
library(forecast)
library(smooth)

library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
        holdout <- as.vector(holdout);
        insample <- as.vector(insample);
	# RMSSE and SAME are defined in greybox v2.0.7
        return(c(RMSSE(holdout, object$mean, mean(diff(insample^2)),
                 SAME(holdout, object$mean, mean(abs(diff(insample)))),
                 object$timeElapsed))
}

datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)

# Method configuration list
# Each method specifies: fn (function name), pkg (package), model, initial,
methodsConfig <- list(
	# ETS and Auto ARIMA from the forecast package in R
	"ETS" = list(fn = "ets", pkg = "forecast", use_x_only = TRUE),
	"Auto ARIMA" = list(fn = "auto.arima", pkg = "forecast", use_x_only = TRUE),
	# ADAM with different initialisation schemes
	"ADAM ETS Back" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ADAM ETS Opt" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ADAM ETS Two" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "two"),
	# ES, which is a wrapper of ADAM. Should give very similar results to ADAM on regular data
	"ES Back" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ES Opt" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ES Two" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "two"),
	# Several flavours for model selection in ES
	"ES XXX" = list(fn = "es", pkg = "smooth", model = "XXX", initial = "back"),
	"ES ZZZ" = list(fn = "es", pkg = "smooth", model = "ZZZ", initial = "back"),
	"ES FFF" = list(fn = "es", pkg = "smooth", model = "FFF", initial = "back"),
	"ES SXS" = list(fn = "es", pkg = "smooth", model = "SXS", initial = "back"),
	# ARIMA implementations in smooth
	"MSARIMA" = list(fn = "auto.msarima", pkg = "smooth", initial = "back"),
	"SSARIMA" = list(fn = "auto.ssarima", pkg = "smooth", initial = "back"),
	# Complex Exponential Smoothing
	"CES" = list(fn = "auto.ces", pkg = "smooth", initial = "back"),
	# Generalised Univeriate Model (experimental)
	"GUM" = list(fn = "auto.gum", pkg = "smooth", initial = "back")
)

methodsNames <- names(methodsConfig)
methodsNumber <- length(methodsNames)

measuresNames <- c("RMSSE","SAME","Time")
measuresNumber <- length(measuresNames)

testResults <- array(NA, c(methodsNumber, datasetLength, measuresNumber),
                     dimnames = list(methodsNames, NULL, measuresNames))

# Unified loop over all methods
for(j in seq_along(methodsConfig)){
	cfg <- methodsConfig[[j]]
	cat("Running method:", methodsNames[j], "\n")

	result <- foreach(i = 1:datasetLength, .combine = "cbind",
	                  .packages = c("smooth", "forecast")) %dopar% {
		startTime <- Sys.time()

		# Build model call based on method type
		if(isTRUE(cfg$use_x_only)){
			# forecast package methods: ets, auto.arima
			test <- do.call(cfg$fn, list(datasets[[i]]$x))
		}else if(cfg$fn %in% c("adam", "es")) {
			# adam and es take dataset and model
			test <- do.call(cfg$fn, list(datasets[[i]], model=cfg$model, initial = cfg$initial))
		}else{
			# auto.msarima, auto.ssarima, auto.ces, auto.gum
			test <- do.call(cfg$fn, list(datasets[[i]], initial = cfg$initial))
		}

		# Build forecast call
		forecast_args <- list(test, h = datasets[[i]]$h)
		testForecast <- do.call(forecast, forecast_args)
		testForecast$timeElapsed <- Sys.time() - startTime

		return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x))
	}
	testResults[j,,] <- t(result)
}

Results

And here are the results for the smooth functions in v4.4.0 for R. First, we summarise the RMSSEs. I produce quartiles of distribution of RMSSE together with the mean.

cbind(t(apply(testResults[,,"RMSSE"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"RMSSE"],1,mean)) |> round(4)

                  0%    25%    50%    75%      100%   mean
ETS           0.0245 0.6772 1.1806 2.3765   51.6160 1.9697
Auto ARIMA    0.0246 0.6802 1.1790 2.3583   51.6160 1.9864
ADAM ETS Back 0.0183 0.6647 1.1620 2.3023   50.2585 1.9283
ADAM ETS Opt  0.0242 0.6714 1.1868 2.3623   51.6160 1.9432
ADAM ETS Two  0.0246 0.6690 1.1875 2.3374   51.6160 1.9480
ES Back       0.0183 0.6674 1.1647 2.3164   50.2585 1.9292
ES Opt        0.0242 0.6740 1.1858 2.3644   51.6160 1.9469
ES Two        0.0245 0.6717 1.1874 2.3463   51.6160 1.9538
ES XXX        0.0183 0.6777 1.1708 2.3062   50.2585 1.9613
ES ZZZ        0.0108 0.6682 1.1816 2.3611  201.4959 2.0841
ES FFF        0.0145 0.6795 1.2170 2.4575 5946.1858 3.3033
ES SXS        0.0183 0.6754 1.1709 2.3539   50.2585 1.9448
MSARIMA       0.0278 0.6988 1.1898 2.4208   51.6160 2.0750
SSARIMA       0.0277 0.7371 1.2544 2.4425   51.6160 2.0625
CES Back      0.0450 0.6761 1.1741 2.3205   51.0571 1.9650
GUM Back      0.0333 0.7077 1.2073 2.4533   51.6184 2.0461

The worst performing models are the ETS with the multiplicative trend (ES ZZZ and ES FFF). This is because there are outliers in some time series, and the multiplicative trend reacts to them by amending the trend value to something large (e.g. 2, i.e. twice increase in level for each step), and then can never return to a reasonable level (see explanation of this phenomenon in Section 6.6 of ADAM book). As expected, ADAM ETS does very similar to the ES, and we can see that the default initialisation (backcasting) is pretty good in terms of RMSSE values. To be fair, if the models are tested on a different dataset, it might be the case that the optimal initialisation would do better.

Here is a table with the SAME results:

cbind(t(apply(testResults[,,"SAME"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"SAME"],1,mean)) |> round(4)

                 0%    25%    50%    75%      100%   mean
ETS           8e-04 0.3757 1.0203 2.5097   54.6872 1.9983
Auto ARIMA    0e+00 0.3992 1.0429 2.4565   53.2710 2.0446
ADAM ETS Back 1e-04 0.3752 0.9965 2.4047   52.3418 1.9518
ADAM ETS Opt  5e-04 0.3733 1.0212 2.4848   55.1018 1.9618
ADAM ETS Two  8e-04 0.3780 1.0316 2.4511   55.1019 1.9712
ES Back       0e+00 0.3733 0.9945 2.4122   53.4504 1.9485
ES Opt        2e-04 0.3727 1.0255 2.4756   54.6860 1.9673
ES Two        1e-04 0.3855 1.0323 2.4535   54.6856 1.9799
ES XXX        1e-04 0.3733 1.0050 2.4257   53.1697 1.9927
ES ZZZ        3e-04 0.3824 1.0135 2.4885  229.7626 2.1376
ES FFF        3e-04 0.3972 1.0489 2.6042 3748.4268 2.9501
ES SXS        6e-04 0.3750 1.0125 2.4627   53.4504 1.9725
MSARIMA       1e-04 0.3960 1.0094 2.5409   54.7916 2.1227
SSARIMA       1e-04 0.4401 1.1222 2.5673   52.5023 2.1248
CES Back      6e-04 0.3767 1.0079 2.4085   54.9026 2.0052
GUM Back      0e+00 0.3803 1.0575 2.6259   63.0637 2.0858

In terms of bias, smooth implementations of ETS are doing well again, and we can see the same issue with the multiplicative trend here as before. Another thing to note is that MSARIMA and SSARIMA are not as good as the Auto ARIMA from the forecast package on these datasets in terms of RMSSE and SAME (at least, in terms of mean error measures). And actually, GUM and CES are now better than those in terms of both error measures.

Finally, here is a table with the computational time:

cbind(t(apply(testResults[,,"Time"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"Time"],1,mean)) |> round(4)

                  0%    25%    50%     75%    100%   mean
ETS           0.0032 0.0117 0.1660  0.6728  1.6400 0.3631
Auto ARIMA    0.0100 0.1184 0.3618  1.0548 54.3652 1.4760
ADAM ETS Back 0.0162 0.1062 0.1854  0.4022  2.5109 0.2950
ADAM ETS Opt  0.0319 0.1920 0.3103  0.6792  3.8933 0.5368
ADAM ETS Two  0.0427 0.2548 0.4035  0.8567  3.7178 0.6331
ES Back       0.0153 0.0896 0.1521  0.3335  2.1128 0.2476
ES Opt        0.0303 0.1667 0.2565  0.5910  3.5887 0.4522
ES Two        0.0483 0.2561 0.4016  0.8626  3.5892 0.6309
MSARIMA Back  0.0614 0.3418 0.6947  0.9868  3.9677 0.7534
SSARIMA Back  0.0292 0.2963 0.8988  2.1729 13.7635 1.6581
CES Back      0.0146 0.0400 0.1834  0.2298  1.2099 0.1713
GUM Back      0.0165 0.2101 1.5221  3.0543  9.5380 1.9506

# Separate table for special pools of ETS.
# The time is proportional to the number of models here
=========================================================
                  0%    25%    50%     75%    100%   mean
ES XXX        0.0114 0.0539 0.0782  0.1110  0.8163 0.0859
ES ZZZ        0.0147 0.1371 0.2690  0.4947  2.2049 0.3780
ES FFF        0.0529 0.2775 1.1539  1.5926  3.8552 1.1231
ES SXS        0.0323 0.1303 0.4491  0.6013  2.2170 0.4581

I have manually moved the specific ES model pools flavours below because there is no point in comparing their computational time with the time of the others (they have different pools of models and thus are not really comparable with the rest).

What we can see from this, is that the ES with backcasting is faster in comparison with the other models in this setting (in terms of mean and median computational time). CES is very fast in terms of mean computational time, which is probably because of the very short pool of models to choose from (only four). SSARIMA is pretty slow, which is due to the nature of its order selection algorithm (I don't plan to update it any time soon, but if someone wants to contribute - let me know). But the interesting thing is that Auto ARIMA, while being relatively fine in terms of median time, has the highest maximum one, meaning that for some time series, it failed for some unknown reason. The series that caused the biggest issue for Auto ARIMA is N389 from the M1 competition. I'm not sure what the issue was, and I don't have time to investigate this.

Mean computational time vs mean RMSSE

Comparing the mean computational time with mean RMSSE value (image above), it looks like the overall tendency in the smooth + forecast functions for the M1, M3 and Tourism datasets is that additional computational time does not improve the accuracy. But it also looks like a simpler pool of pure additive models (ETS(X,X,X)) harms the accuracy in comparison with the branch-and-bound based one of the default model="ZXZ". There seems to be a sweet spot in terms of the pool of models to choose from (no multiplicative trend, allow mixed models). This aligns well with the papers of Petropoulos et al. (2025), who investigated the accuracy of arbitrary short pools of models and Kourentzes et al. (2019), who showed how pooling (if done correctly) can improve the accuracy on average.

What's next?

For R, the main task now is to rewrite the oes() function and substitute it with the om() one - "Occurrence Model". This should be equivalent to adam() in functionality, allowing to introduce ETS, ARIMA and explanatory variables for the occurrence part of the model. This is a huge work, which I hope to progress slowly throughout the 2026 and finish by the end of the year. Doing that will also allow me removing the last bits of the old C++ code and switch to the ADAM core completely, introducing more functionality for capturing patterns on intermittent demand. The minor task, is to test the smoother="global" more for the ETS initialisation and roll it out as the default in the next release for both R and Python.

For Python,... What Python? Ah! You'll see soon :)

Message smooth v4.4.0 first appeared on Open Forecasting.

smooth v4.3.0 in R: what’s new and what’s next?

Ivan Svetunkov — Fri, 04 Jul 2025 10:02:17 +0000

Good news! The smooth package v4.3.0 is now on CRAN. And there are several things worth mentioning, so I have written this post.

New default initialisation mechanism

Since the beginning of the package, the smooth functions supported three ways for initialising the state vector (the vector that includes level, trend, seasonal indices): optimisation, backcasting and values provided by user. The former has been considered the standard way of estimating ETS, while the backcasting was originally proposed by Box & Jenkins (1970) and was only implemented in the smooth (at least, I haven’t seen it anywhere else). The main advantage of the latter is in computational time, because you do not need to estimate every single value of the state vector. The new ADAM core that I developed during COVID lockdown, had some improvements for the backcasting, and I noticed that adam() produced more accurate forecasts with it than with the optimisation. But I needed more testing, so I have not changed anything back then.

However, my recent work with Kandrika Pritularga on capturing uncertainty in ETS, have demonstrated that backcasting solves some fundamental problems with the variance of states – the optimisation cannot handle so many parameters, and asymptotic properties of ETS do not make sense in that case (we’ll release the paper as soon as we finish the experiments). So, with this evidence on hands and additional tests, I have made a decision to switch from the optimisation to backcasting as the default initialisation mechanism for all the smooth functions.

The final users should not feel much difference, but it should work faster now and (hopefully) more accurately. If this is not the case, please get in touch or file an issue on github.

Also, rest assured the initial="optimal" is available and will stay available as an option in all the smooth functions, so, you can always switch back to it if you don’t like backcasting.

Finally, I have introduce a new initialisation mechanism called “two-stage”, the idea of which is to apply backcasting first and then to optimise the obtained state values. It is slower, but is supposed to be better than the standard optimisation.

ADAM core

Every single function in the smooth package now uses ADAM C++ core, and the old core will be discontinued starting from v4.5.0 of the package. This applies to the functions: es(), ssarima(), msarima(), ces(), gum(), sma(). There are now the legacy versions of these functions in the package with the prefix “_old” (e.g. es_old()), which will be removed in the smooth v4.5.0. The new engine also helped ssarima(), which now became slightly more accurate than before. Unfortunately, there are still some issues with the initialisation of the seasonal ssarima(), which I have failed to solve completely. But I hope that over time this will be resolved as well.

smooth performance update

I have applied all the smooth functions together with the ets() and auto.arima() from the forecast package to the M1, M3 and Tourism competition data and have measured their performances in terms of RMSSE, scaled Cumulative Error (sCE) and computational time. I used the following R code for that:

Long and boring code in R

library(Mcomp)
library(Tcomp)

library(forecast)
library(smooth)

# I work on Linux and use doMC. Substitute this with doParallel if you use Windows
library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
	holdout <- as.vector(holdout);
	insample <- as.vector(insample);
	return(c(measures(holdout, object$mean, insample),
			 mean(holdout < object$upper & holdout > object$lower),
			 mean(object$upper-object$lower)/mean(insample),
			 pinball(holdout, object$upper, 0.975)/mean(insample),
			 pinball(holdout, object$lower, 0.025)/mean(insample),
			 sMIS(holdout, object$lower, object$upper, mean(insample),0.95),
			 object$timeElapsed))
}

# Datasets to use
datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)
# Types of models to try
methodsNames <- c("ETS", "Auto ARIMA",
				  "ADAM ETS Back", "ADAM ETS Opt", "ADAM ETS Two",
				  "ES Back", "ES Opt", "ES Two",
				  "ADAM ARIMA Back", "ADAM ARIMA Opt", "ADAM ARIMA Two",
				  "MSARIMA Back", "MSARIMA Opt", "MSARIMA Two",
				  "SSARIMA Back", "SSARIMA Opt", "SSARIMA Two",
				  "CES Back", "CES Opt", "CES Two",
				  "GUM Back", "GUM Opt", "GUM Two");
methodsNumber <- length(methodsNames);
test <- adam(datasets[[125]]);

testResults20250603 <- array(NA,c(methodsNumber,datasetLength,length(test$accuracy)+6),
                             dimnames=list(methodsNames, NULL,
                                           c(names(test$accuracy),
                                             "Coverage","Range",
                                             "pinballUpper","pinballLower","sMIS",
                                             "Time")));

#### ETS from forecast package ####
j <- 1;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
  startTime <- Sys.time()
  test <- ets(datasets[[i]]$x);
  testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### AUTOARIMA ####
j <- 2;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.arima(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Backcasting ####
j <- 3;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Optimal ####
j <- 4;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Two-stage ####
j <- 5;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Backcasting ####
j <- 6;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Optimal ####
j <- 7;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Two-stage ####
j <- 8;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Backcasting ####
j <- 9;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="back", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Optimal ####
j <- 10;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="opt", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Two-stage ####
j <- 11;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="two", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Backcasting ####
j <- 12;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Optimal ####
j <- 13;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Two-stage ####
j <- 14;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Backcasting ####
j <- 15;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ssarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Optimal ####
j <- 16;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="opt");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Two-stage ####
j <- 17;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="two");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Backcasting ####
j <- 18;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Optimal ####
j <- 19;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Two-stage ####
j <- 20;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Backcasting ####
j <- 21;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Optimal ####
j <- 22;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Two-stage ####
j <- 23;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

# Summary of results
cbind(t(apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,quantile)),
mean=apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,mean),
sCE=apply(testResults20250603[c(1:8,12:23),,"sCE"],1,mean),
Time=apply(testResults20250603[c(1:8,12:23),,"Time"],1,mean)) |> round(3)

The table below shows the distribution of RMSSE, the mean sCE and mean Time. The boldface shows the best performing model.

                    min   Q1  median   Q3     max  mean    sCE  Time
ETS                0.024 0.677 1.181 2.376  51.616 1.970  0.299 0.385
Auto ARIMA         0.025 0.680 1.179 2.358  51.616 1.986  0.124 1.467

ADAM ETS Back      0.015 0.666 1.175 2.276  51.616 1.921  0.470 0.218
ADAM ETS Opt       0.020 0.666 1.190 2.311  51.616 1.937  0.299 0.432
ADAM ETS Two       0.025 0.666 1.179 2.330  51.616 1.951  0.330 0.579

ES Back            0.015 0.672 1.174 2.284  51.616 1.921  0.464 0.219
ES Opt             0.020 0.672 1.186 2.316  51.616 1.943  0.302 0.497
ES Two             0.024 0.668 1.181 2.346  51.616 1.952  0.346 0.562

MSARIMA Back       0.025 0.710 1.188 2.383  51.616 2.028  0.067 0.780
MSARIMA Opt        0.025 0.724 1.242 2.489  51.616 2.083  0.088 1.905
MSARIMA Two        0.025 0.718 1.250 2.485  51.906 2.075  0.083 2.431

SSARIMA Back       0.045 0.738 1.248 2.383  51.616 2.063  0.167 1.747
SSARIMA Opt        0.025 0.774 1.292 2.413  51.616 2.040  0.178 7.324
SSARIMA Two        0.025 0.742 1.241 2.414  51.616 2.027  0.183 8.096

CES Back           0.046 0.695 1.189 2.355  51.342 1.981  0.125 0.185
CES Opt            0.030 0.698 1.218 2.327  49.480 2.001 -0.135 0.834
CES Two            0.025 0.696 1.207 2.343  51.242 1.993 -0.078 1.006

GUM Back           0.046 0.707 1.215 2.399  51.134 2.049 -0.285 3.575
GUM Opt            0.026 0.795 1.381 2.717 240.143 2.932 -0.549 4.668
GUM Two            0.026 0.803 1.406 2.826 240.143 3.041 -0.593 4.703

Several notes:

ES is a wrapper of ADAM ETS. The main difference between them is that the latter uses the Gamma distribution for the multiplicative error models, while the former relies on the Normal one.
MSARIMA is a wrapper for ADAM ARIMA, which is why I don't report the latter in the results.

One thing you can notice from the output above, is that the models with backcasting consistently produce more accurate forecasts across all measures. I explain this with the idea that they tend not to overfit the data as much as the optimal initialisation does.

To see the stochastic dominance of the forecasting models, I conducted the modification of the MCB/Nemenyi test, explained in this post:

par(mar=c(10,3,4,1))
greybox::rmcb(t(testResults20250603[c(1:8,12:23),,"RMSSE"]), outplot="mcb")

Nemenyi test for the smooth functions

The image shows mean ranks for each of the models and whether the performance of those is significant on the 5% level or not. It is apparent that ADAM ETS has the lowest rank, no matter what the initialisation is used, but its performance does not differ significantly from the es(), ets() and auto.arima(). Also, auto.arima() significantly outperforms msarima() and ssarima() on this data, which could be due to their initialisation. Still, backcasting seems to help all the functions in terms of accuracy in comparison with the "optimal" and "two-stage" initials.

What's next?

I am now working on a modified formulation for ETS, which should fix some issues with the multiplicative trend and make the ETS safer. This is based on Section 6.6 of the online version of the ADAM monograph (it is not in the printed version). I am not sure whether this will improve the accuracy further, but I hope that it will make some of the ETS models more resilient than they are right now. I specifically need the multiplicative trend model, which sometimes behave like crazy due to its formulation.

I also plan to translate all the simulation functions to the ADAM core. This applies to sim.es(), sim.ssarima(), sim.gum() and sim.ces(). Currently they rely on the older one, and I want to get rid of it. Having said that, the method simulate() applied to the new smooth functions already uses the new core. It just lacks the flexibility that the other functions have.

Furthermore, I want to rewrite the oes() function and substitute it with oadam(), which would use a better engine, supporting more features, such as multiple frequencies and ARIMA for the occurrence. This is a lot of work, and I probably will need help with that.

Finally, Filotas Theodosiou, Leonidas Tsaprounis, and I are working on the translation of the R code of the smooth to Python. You can read a bit more about this project here. There are several other people who decided to help us, but the progress so far has been a bit slow, because of the code translation. If you want to help, please get in touch.

Message smooth v4.3.0 in R: what’s new and what’s next? first appeared on Open Forecasting.

Fundamental Flaw of the Box-Jenkins Methodology

Ivan Svetunkov — Tue, 13 May 2025 11:57:07 +0000

If you have taken a course on forecasting or time series analysis, you’ve probably heard of ARIMA and the Box–Jenkins methodology. In my opinion, this methodology has a fundamental flaw and should not be used in practice. Here’s why.

When Box and Jenkins wrote their book back in the 1960s, it was a very different era: computers were massive, and people worked with punch cards. To make their approach viable, Box and Jenkins developed a methodology for selecting the appropriate orders of AR and MA based on the values of the autocorrelation and partial autocorrelation functions (ACF and PACF, respectively). Their idea was that if an ARMA process generates a specific ACF/PACF pattern, then it could be identified by analysing those functions in the data. At the time, it wasn’t feasible to do cross-validation or rolling origin evaluation, and even using information criteria for model selection was a challenge. So, the Box–Jenkins approach was a sensible option, producing adequate results with limited computational resources, and was considered state of the art.

Unfortunately, as the M1 competition later showed (see my earlier post), the methodology didn’t work well in practice. Simpler methods that didn’t rely on rigorous model selection actually performed better. But in fact, the winning model in the competition was ARARMA by Emanuel Parzen (https://doi.org/10.1002/for.3980010108). His idea was to make the series stationary by applying a low-order, non-stationary AR to the data, then extract residuals and select appropriate ARMA orders using AIC. Parzen ignored the Box–Jenkins methodology entirely – he didn’t analyse ACF or PACF and instead relied fully on automated selection. And it worked!

So why didn’t the Box–Jenkins methodology perform as expected? In my monograph Forecasting and Analytics with ADAM, I use the following example to explain the main issue: “All birds have wings. Sarah has wings. Thus, Sarah is a bird.” But Sarah, as shown in the image attached to this post, is a butterfly.

The fundamental issue with the Box–Jenkins methodology lies in its logic: if a process generates a specific ACF/PACF, that doesn’t mean that an observed ACF/PACF must come from that process. Many ARMA and even non-ARMA processes can generate exactly the same autocorrelation structure.

Further developments in ARIMA modelling have shown that ACF and PACF can only be used as general guidelines for order selection. To assess model performance properly, we need other tools. All modern approaches rely on information criteria for ARIMA order selection, and they consistently perform well in forecasting competitions. For example, Hyndman & Khandakar (2008) use AIC for ARMA order selection, while Svetunkov & Boylan (2020) apply AIC after reformulating ARIMA in a state space form. The former is implemented in the forecast package in R and the StatsForecast library in Python (thanks to Nixtla and Azul Garza); the latter is available in the smooth package in R. I also discuss another ARIMA order selection approach in Section 15.2 of my book.

Long story short: don’t use the Box–Jenkins methodology for order selection. Use more modern tools, such as information criteria.

P.S. See also my early post on ARIMA, discussing what is wrong with it.

Message Fundamental Flaw of the Box-Jenkins Methodology first appeared on Open Forecasting.

Multistep loss functions: Geometric Trace MSE

Ivan Svetunkov — Tue, 04 Jun 2024 09:05:56 +0000

While there is a lot to say about multistep losses, I’ve decided to write the final post on one of them and leave the topic alone for a while. Here it goes.

Last time, we discussed MSEh and TMSE, and I mentioned that both of them impose shrinkage and have some advantages and disadvantages. One of the main advantages of TMSE was in reducing computational time in comparison with MSEh: you just fit one model with it instead of doing it h times. However, the downside of TMSE is that it averages things out, and we end up with model parameters that minimize the h-steps-ahead forecast error to a much larger extent than those that are close to the one-step-ahead. For example, if the one-step-ahead MSE was 500, while the six-steps-ahead MSE was 3000, the impact of the latter in TMSE would be six times higher than that of the former, and the estimator would prioritize the minimization of the longer horizon one.

A more balanced version of this was introduced in our paper and was called “Geometric Trace MSE” (GTMSE). The main idea of GTMSE is to take the geometric mean or, equivalently, the sum of logarithms of MSEh instead of taking the arithmetic mean. Because of that, the impact of MSEh on the loss becomes comparable with the effect of MSE1, and the model performs well throughout the whole horizon from 1 to h. For the same example of MSEs as above, the logarithm of 500 is approximately 2.7, while the logarithm of 3000 is 3.5. The difference between the two is much smaller, reducing the impact of the long-term forecast uncertainty. As a result, GTMSE has the following features:

It imposes shrinkage on models parameters.
The strength of shrinkage is proportional to the forecast horizon.
But it is much milder than in case of MSEh or TMSE.
It leads to more balanced forecasts, performing well on average across the whole horizon.

In that paper, we did extensive simulations to see how different estimators behave, and we found that:

If an analyst is interested in parameters of models, they should stick with the conventional loss functions (based on one-step-ahead forecast error) because the multistep ones tend to produce biased estimates of parameters.
On the other hand, multistep losses kick off the redundant parameters faster than the conventional one, so there might be a benefit in the case of overparameterized models.
At the same time, if forecasting is of the main interest, then multistep losses might bring benefits, especially on larger samples.

ETS(A,A,A) estimated using different loss functions applied to the data with multiplicative seasonality

The image above shows an example from our paper, where we applied the additive model to the data, which exhibits apparent multiplicative seasonality. Despite that, we can see that multistep losses did a much better job than the conventional MSE, compensating for the misspecification.

Message Multistep loss functions: Geometric Trace MSE first appeared on Open Forecasting.

Detecting patterns in white noise

Ivan Svetunkov — Wed, 10 Apr 2024 08:16:58 +0000

Back in 2015, when I was working on my paper on Complex Exponential Smoothing, I conducted a simple simulation experiment to check how ARIMA and ETS select components/orders in time series. And I found something interesting…

One of the important steps in forecasting with statistical models is identifying the existing structure. In the case of ETS, it comes to selecting trend/seasonal components, while for ARIMA, it’s about order selection. In R, several functions automatically handle this based on information criteria (Hyndman & Khandakar, 2006; Svetunkov & Boylan (2017); Chapter 15 of ADAM). I decided to investigate how this mechanism works.

I generated data from the Normal distribution with a fixed mean of 5000 and a standard deviation of 50. Then, I asked ETS and ARIMA (from the forecast package in R) to automatically select the appropriate model for each of 1000 time series. Here is the R code for this simple experiment:

Some R code

# Set random seed for reproducibility
set.seed(41, kind="L'Ecuyer-CMRG")
# Number of iterations
nsim <- 1000
# Number of observations
obsAll <- 120
# Generate data from N(5000, 50)
rnorm(nsim*obsAll, 5000, 50) |>
  matrix(obsAll, nsim) |>
  ts(frequency=12) -> x

# Load forecast package
library(forecast)
# Load doMC for parallel calculations
# doMC is only available on Linux and Max
# Use library(doParallel) on Windows
library(doMC)
registerDoMC(detectCores())

# A loop for ARIMA, recording the orders
matArima <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- auto.arima(x[,i])
    # The element number 5 is just m, period of seasonality
    return(c(testModel$arma[-5],(!is.na(testModel$coef["drift"]))*1))
}
rownames(matArima) <- c("AR","MA","SAR","SMA","I","SI","Drift")

# A loop for ETS, recording the model types
matEts <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- ets(x[,i], allow.multiplicative.trend=TRUE)
    return(testModel[13]$method)
}

The findings of this experiment are summarised using the following chunk of the R code:

R code for the analysis of the results

#### Auto ARIMA ####
# Non-seasonal ARIMA elements
mean(apply(matArima[c("AR","MA","I","Drift"),]!=0, 2, any))
# Seasonal ARIMA elements
mean(apply(matArima[c("SAR","SMA","SI"),]!=0, 2, any))

#### ETS ####
# Trend in ETS
mean(substr(matEts,7,7)!="N")
# Seasonality in ETS
mean(substr(matEts,nchar(matEts)-1,nchar(matEts)-1)!="N")

I summarised them in the following table:

	ARIMA	ETS
Non-seasonal elements	24.8%	2.3%
Seasonal elements	18.0%	0.2%
Any type of structure	37.9%	2.4%

So, ARIMA detected some structure (had non-zero orders) in almost 40% of all time series, even though the data was designed to have no structure (just white noise). It also captured non-seasonal orders in a quarter of the series and identified seasonality in 18% of them. ETS performed better (only 0.2% of seasonal models identified on the white noise), but still captured trends in 2.3% of cases.

Does this simple experiment suggest that ARIMA is a bad model and ETS is a good one? No, it does not. It simply demonstrates that ARIMA tends to overfit the data if allowed to select whatever it wants. How can we fix that?

My solution: restrict the pool of ARIMA models to check, preventing it from going crazy. My personal pool includes ARIMA(0,1,1), (1,1,2), (0,2,2), along with the seasonal orders of (0,1,1), (1,1,2), and (0,2,2), and combinations between them. This approach is motivated by the connection between ARIMA and ETS. Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included.

This algorithm can be written in the following simple function that uses msarima() function from the smooth package in R (note that the reason why this function is used is because all ARIMA models implemented in the function are directly comparable via information criteria):

R code for the compact ARIMA function

arimaCompact <- function(y, lags=c(1,frequency(y)), ic=c("AICc","AIC","BIC","BICc"), ...){

    # Start measuring the time of calculations
    startTime <- Sys.time();

    # If there are no lags for the basic components, correct this.
    if(sum(lags==1)==0){
        lags <- c(1,lags);
    }

    orderLength <- length(lags);
    ic <- match.arg(ic);
    IC <- switch(ic,
                 "AIC"=AIC,
                 "AICc"=AICc,
                 "BIC"=BIC,
                 "BICc"=BICc);

    # We consider the following list of models:
    # ARIMA(0,1,1), (1,1,2), (0,2,2),
    # ARIMA(0,0,0)+c, ARIMA(0,1,1)+c,
    # seasonal orders (0,1,1), (1,1,2), (0,2,2)
    # And all combinations between seasonal and non-seasonal parts
    # 
    # Encode all non-seasonal parts
    nNonSeasonal <- 5
    arimaNonSeasonal <- matrix(c(0,1,1,0, 1,1,2,0, 0,2,2,0, 0,0,0,1, 0,1,1,1), nNonSeasonal,4,
                               dimnames=list(NULL, c("ar","i","ma","const")), byrow=TRUE)
    # Encode all seasonal parts ()
    nSeasonal <- 4
    arimaSeasonal <- matrix(c(0,0,0, 0,1,1, 1,1,2, 0,2,2), nSeasonal,3,
                               dimnames=list(NULL, c("sar","si","sma")), byrow=TRUE)

    # Check all the models in the pool
    testModels <- vector("list", nSeasonal*nNonSeasonal);
    m <- 1;
    for(i in 1:nSeasonal){
        for(j in 1:nNonSeasonal){
            testModels[[m]] <- msarima(y, orders=list(ar=c(arimaNonSeasonal[j,1],arimaSeasonal[i,1]),
                                                      i=c(arimaNonSeasonal[j,2],arimaSeasonal[i,2]),
                                                      ma=c(arimaNonSeasonal[j,3],arimaSeasonal[i,3])),
                                       constant=arimaNonSeasonal[j,4]==1, lags=lags, ...);
            m[] <- m+1;
        }
    }

    # Find the best one
    m <- which.min(sapply(testModels, IC));
    # Amend computational time
    testModels[[m]]$timeElapsed <- Sys.time()-startTime;

    return(testModels[[m]]);
}

Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included. I have not added that part in the code above. Still, this algorithm brings some improvements:

R code for the application of compact ARIMA to the data

#### Load the smooth package
library(smooth)

# A loop for the compact ARIMA, recording the orders
matArimaCompact <- foreach(i=1:nsim, .packages=c("smooth")) %dopar% {
    testModel <- arimaCompact(x[,i])
    return(orders(testModel))
}

#### Auto MSARIMA from smooth ####
# Non-seasonal ARIMA elements
mean(sapply(sapply(matArimaCompact, "[[", "ar"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "i"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "ma"), function(x){x[1]!=0}))

# Seasonal ARIMA elements
mean(sapply(sapply(matArimaSmooth, "[[", "ar"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "i"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "ma"), function(x){length(x)==2 && (x[2]!=0)}))

In my case, it resulted in the following:

	ARIMA	ETS	Compact ARIMA
Non-seasonal elements	24.8%	2.3%	2.4%
Seasonal elements	18.0%	0.2%	0.0%
Any type of structure	37.9%	2.4%	2.4%

As we see, when we impose restrictions on order selection in ARIMA, it avoids fitting seasonal models to non-seasonal data. While it still makes minor mistakes in terms of non-seasonal structure, it's nothing compared to the conventional approach. What about accuracy? I don't know. I'll have to write another post on this :).

Note that the models were applied to samples of 120 observations, which is considered "small" in statistics, while in real life is sometimes a luxury to have...

Message Detecting patterns in white noise first appeared on Open Forecasting.

What does “lower error measure” really mean?

Ivan Svetunkov — Wed, 27 Mar 2024 18:29:03 +0000

“My amazing forecasting method has a lower MASE than any other method!” You’ve probably seen claims like this on social media or in papers. But have you ever thought about what it really means?

Many forecasting experiments come to applying several approaches to a dataset, calculating error measures for each method per time series and aggregating them to get a neat table like this one (based on the M and tourism competitions, with R code similar to the one from this post):

           RMSSE    sCE
ADAM ETS   1.947  0.319
ETS        1.970  0.299
ARIMA      1.986  0.125
CES        1.960 -0.011

Typically, the conclusion drawn from such tables is that the approach with the measure closest to zero performs the best, on average. I’ve done this myself many times because it’s a simple way to present results. So, what’s the issue?

Well, almost any error measure has a skewed distribution because it cannot be lower than zero, and its highest value is infinity (this doesn’t apply to bias). Let me show you:

Distribution of RMSSE for ADAM ETS on M and tourism competitions data

This figure shows the distribution of 5,315 RMSSE values for ADAM ETS. As seen from the violin plot, the distribution has a peak close to zero and a very long tail. This suggests that the model performed well on many series but generated inaccurate forecasts for just a few, or perhaps only one of them. The mean RMSSE is 1.947 (vertical red line on the plot). However, this single value alone does not provide full information about the model’s performance. Firstly, it tries to represent the entire distribution with just one number. Secondly, as we know from statistics, the mean is influenced by outliers: if your method performed exceptionally well on 99% of cases but poorly on just 1%, the mean will be higher than in the case of a poorly performing method that did not do that badly on 1%.

So, what should we do?

At the very least, provide both mean and median error measures: the distance between them shows how skewed the distribution is. But an even better approach would be to report mean and several quantiles of error measures (not just median). For instance, we could present the 1st, 2nd (median), and 3rd quartiles together with mean, min and max to offer a clearer understanding of the spread and variability of the error measure:

             min    1Q  median    3Q     max  mean
ADAM ETS   0.024 0.670   1.180 2.340  51.616 1.947
ETS        0.024 0.677   1.181 2.376  51.616 1.970
ARIMA      0.025 0.681   1.179 2.358  51.616 1.986
CES        0.045 0.675   1.171 2.330  51.201 1.960

This table provides better insights: ETS models consistently perform well in terms of mean, min, and first quartile RMSSE, while ARIMA outperformed them in terms of median. CES did better than the others in terms of median, third quartile, and the maximum. This means that there were some time series, where ETS struggled a bit more than CES, but in the majority of cases it performed well.

So, next time you see a table with error measures, keep in mind that the best method on average might not be the best consistently. Having more details helps in understanding the situation better.

You can read more about error measures for forecasting in the “forecast evaluation” category.

Message What does “lower error measure” really mean? first appeared on Open Forecasting.

What’s wrong with ARIMA?

Ivan Svetunkov — Thu, 21 Mar 2024 10:10:52 +0000

Have you heard of ARIMA? It is one of the benchmark forecasting models used in different academic experiments, although it is not always popular among practitioners. But why? What’s wrong with ARIMA?

ARIMA has been a standard forecasting model in statistics for ages. It gained popularity with the famous Box & Jenkins (1970) book and was considered the best forecasting model by statisticians for a couple of decades without any strong evidence to support this.

It represents one of the two fundamental approaches to time series modeling (the second being the state space approach): it captures the relation between the variable and itself in the past. This has a great rationale in technical areas. For example, the quantity of CO2 in a furnace at this moment in time will depend on the quantity of CO2 five minutes ago. Such processes can be efficiently modeled and then forecasted using ARIMA. In demand forecasting, making sense of ARIMA is more challenging: it is hard to argue that the demand for shoes on Monday can impact the demand on Tuesday. So, when we apply ARIMA to such data, we sort of rely on a spurious relation. Still, demand data often exhibits autocorrelations, and ARIMA has been used efficiently in that context.

Over the years, ARIMA did not perform well in different competitions (see my post about that), but this was mainly due to the wrong assumptions in Box-Jenkins methodology, not because the model has been fundamentally bad. After Hyndman & Khandakar (2006) implemented their version with automatic order selection based on information criteria, ARIMA has started producing much more accurate forecasts.

But if I were to summarize what the problem with the model is, I would outline these points:

It is hard to explain ARIMA to people who are not comfortable with statistics. Here is an example of how seasonal ARIMA(1,0,1)(1,0,1)_4 is written mathematically:
\begin{equation*}
y_t (1 -\phi_{4,1} B^4)(1 -\phi_{1} B) = \epsilon_t (1 + \theta_{4,1} B^4) (1 + \theta_{1} B).
\end{equation*}
Good luck explaining this to a demand planner who does not know mathematics.
It is hard to estimate, especially for models with seasonality. It is typically estimated using some numeric optimisation, and reaching the maximum likelihood (or a global minimum of a loss function) is not guaranteed.
It is hard to select the appropriate order of the model, as there can be thousands of potential models to choose from. Yes, there are heuristic approaches that allow simplifying the problem and selecting a reasonable model (e.g. Hyndman & Khandakar, 2006; or Svetunkov & Boylan, 2017), but they do not guarantee that you will get the best possible model.

Nonetheless, ARIMA is a strong contender that can outperform other models if implemented well. Furthermore, it has become one of the standard forecasting benchmarks in forecasting-related experiments. So, if you are a data scientist comfortable with mathematics and want to see how your machine learning approach performs, you should consider including ARIMA as a benchmark.

P.S. Check out a post by Nicolas Vandeput on LinkedIn – he had a discussion about ARIMA and raised good points as well.

Message What’s wrong with ARIMA? first appeared on Open Forecasting.

The role of M competitions in forecasting

Ivan Svetunkov — Thu, 14 Mar 2024 14:53:03 +0000

If you are interested in forecasting, you might have heard of M-competitions. They played a pivotal role in developing forecasting principles, yet also sparked controversy. In this short post, I’ll briefly explain their historical significance and discuss their main findings.

Before M-competitions, only few papers properly evaluated forecasting approaches. Statisticians assumed that if a model had solid theoretical backing, it should perform well. One of the first papers to conduct a proper evaluation was Newbold & Granger (1974), who compared exponential smoothing (ES), ARIMA, and stepwise AR on 106 economic time series. Their conclusions were:

1. ES performed well on short time series;
2. Stepwise AR did well on the series with more than 30 observations;
3. Box-Jenkins methodology was recommended for series longer than 50 observations.

Statistical community received the results favourably, as they aligned with their expectations.

In 1979, Makridakis & Hibon conducted a similar analysis on 111 time series, including various ES methods and ARIMA. However, they found that “simpler methods perform well in comparison to the more complex and statistically sophisticated ARMA models”. This is because ARIMA performed slightly worse than ES, which contradicted the findings of Newbold & Granger. Furthermore, their paper faced heavy criticism, with some claiming that Makridakis did not correctly utilize Box-Jenkins methodology.

So, in 1982, Makridakis et al. organized a competition on 1001 time series, inviting external participants to submit their forecasts. It was won by… the ARARMA model by Emmanuel Parzen. This model used information criteria for ARMA order selection instead of Box-Jenkins methodology. The main conclusion drawn from this competition was that “Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.” Note that this does not mean that simple methods are always better, because that was not even the case in the first competition: it was won by a quite complicated statistical model based on ARMA. This only means that the complexity does not necessarily translate into accuracy.

The M2 competition focused on judgmental forecasting, and is not discussed here.

We then arrive to M3 competition with 3003 time series and, once again, open submission for anyone. The results widely confirmed the previous findings, with Theta by Vasilious Assimakopoulos and Kostas Nikolopoulos outperforming all the other methods. Note that ARIMA with order selection based on Box-Jenkins methodology performed fine, but could not beat its competitors.

Finally, we arrive to M4 competition, which had 100,000 time series and was open to even wider audience. While I have my reservations about the competition itself, there were several curious findings, including the fact that ARIMA implemented by Hyndman & Khandakar (2008) performed on average better than ETS (Theta outperformed both of them), and that the more complex methods won the competition.

It was also the first paper to show that the accuracy tends to increase on average with the increase of the computational time spent for training. This means that if you want to have more accurate forecasts, you need to spend more resources. The only catch is that this happens with the decreasing return effect. So, the improvements become smaller and smaller the more time you spend on training.

The competition was followed by M5 and M6, and now they plan to have another one. I don’t want to discuss all of them – they are beyond the scope of this short post (see details on the website of the competitions). But I personally find the first competitions very impactful and useful.

And here are my personal takeaways from these competitions:

1. Simple forecasting methods perform well and should be included as benchmarks in experiments;
2. Complex methods can outperform simple ones, especially if used intelligently, but you might need to spend more resources to gain in accuracy;
3. ARIMA is effective, but Box-Jenkins methodology may not be practical. Using information criteria for order selection is a better approach (as evidenced from ARARMA example and Hydnman & Khandakar implementation).

Finally, I like the following quote from Rob J. Hyndman about the competitions that gives some additional perspective: “The “M” competitions organized by Spyros Makridakis have had an enormous influence on the field of forecasting. They focused attention on what models produced good forecasts, rather than on the mathematical properties of those models”.

Table with the results of the M3 competition

Message The role of M competitions in forecasting first appeared on Open Forecasting.

Multi-step Estimators and Shrinkage Effect in Time Series Models

Ivan Svetunkov — Wed, 09 Aug 2023 10:14:59 +0000

Authors: Ivan Svetunkov, Nikos Kourentzes, Rebecca Killick

Journal: Computational Statistics

Abstract: Many modern statistical models are used for both insight and prediction when applied to data. When models are used for prediction one should optimise parameters through a prediction error loss function. Estimation methods based on multiple steps ahead forecast errors have been shown to lead to more robust and less biased estimates of parameters. However, a plausible explanation of why this is the case is lacking. In this paper, we provide this explanation, showing that the main benefit of these estimators is in a shrinkage effect, happening in univariate models naturally. However, this can introduce a series of limitations, due to overly aggressive shrinkage. We discuss the predictive likelihoods related to the multistep estimators and demonstrate what their usage implies to time series models. To overcome the limitations of the existing multiple steps estimators, we propose the Geometric Trace Mean Squared Error, demonstrating its advantages. We conduct a simulation experiment showing how the estimators behave with different sample sizes and forecast horizons. Finally, we carry out an empirical evaluation on real data, demonstrating the performance and advantages of the estimators. Given that the underlying process to be modelled is often unknown, we conclude that the shrinkage achieved by the GTMSE is a competitive alternative to conventional ones.

DOI: 10.1007/s00180-023-01377-x.

Working paper.

About the paper

DISCLAIMER 1: To better understand what I am talking about in this section, I would recommend you to have a look at the ADAM monograph, and specifically at the Chapter 11. In fact, Section 11.3 is based on this paper.

DISCLAIMER 2: All the discussions in the paper only apply to pure additive models. If you are interested in multiplicative or mixed ETS models, you’ll have to wait another seven years for another paper on this topic to get written and published.

Introduction

There are lots of ways how dynamic models can be estimated. Some analysts prefer likelihood, some would stick with Least Squares (i.e. minimising MSE), while others would use advanced estimators like Huber’s loss or M-estimators. And sometimes, statisticians or machine learning experts would use multiple steps ahead estimators. For example, they would use a so-called “direct forecast” by fitting a model to the data, producing h-steps ahead in-sample point forecasts from the very first to the very last observation, then calculating the respective h-steps ahead forecast errors and (based on them) Mean Squared Error. Mathematically, this can be written as:

\begin{equation} \label{eq:hstepsMSE}
\mathrm{MSE}_h = \frac{1}{T-h} \sum_{t=1}^{T-h} e_{t+h|t}^2 ,
\end{equation}
where \(e_{t+h|t}\) is the h-steps ahead error for the point forecast produced from the observation \(t\), and \(T\) is the sample size.

In my final year of PhD, I have decided to analyse how different multistep loss functions work, to understand what happens with dynamic models, when these losses are minimised, and how this can help in efficient model estimation. Doing the literature review, I noticed that the claims about the multistep estimators are sometimes contradictory: some authors say that they are more efficient (i.e. estimates of parameters have lower variances) than the conventional estimators, some say that they are less efficient; some claim that they improve accuracy, while the others do not find any substantial improvements. Finally, I could not find a proper explanation of what happens with the dynamic models when the estimators are used. So, I’ve started my own investigation, together with Nikos Kourentzes and Rebecca Killick (who was my internal examiner and joined our team after my graduation).

Our investigation started with the single source of error model, then led us to predictive likelihoods and, after that – to the development of a couple of non-conventional estimators. As a result, the paper grew and became less focused than initially intended. In the end, it became 42 pages long and discussed several aspects of models estimation (making it a bit of a hodgepodge):

How multistep estimators regularise parameters of dynamic models;
That multistep forecast errors are always correlated when the models’ parameters are not zero;
What predictive likelihoods align with the multistep estimators (this is useful for a discussion of their statistical properties);
How General Predictive Likelihood encompasses all popular multistep estimators;
And that there is another estimator (namely GTMSE – Geometric Trace Mean Squared Error), which has good properties and has not been discussed in the literature before.

Because of the size of the paper and the spread of the topics throughout it, many reviewers ignored (1) – (4), focusing on (5) and thus rejecting the paper on the grounds that we propose a new estimator, but instead spend too much time discussing irrelevant topics. These types of comments were given to us by the editor of the Journal of the Royal Statistical Society: B and reviewers of Computational Statistics and Data Analysis. While we tried addressing this issue several times, given the size of the paper, we failed to fix it fully. The paper was rejected from both of these journals and ended up in Computational Statistics, where the editor gave us a chance to respond to the comments. We explained what the paper was really about and changed its focus to satisfy the reviewers, after which the paper was accepted.

So, what are the main findings of this paper?

How multistep estimators regularise parameters of dynamic models

Given that any dynamic model (such as ETS or ARIMA) can be represented in the Single Source of Error state space form, we showed that the application of multistep estimators leads to the inclusion of parameters of models in the loss function, leading to the regularisation. In ETS, this means that the smoothing parameters are shrunk to zero, with the shrinkage becoming stronger with the increase of the forecasting horizon relative to the sample size. This makes the models less stochastic and more conservative. Mathematically this becomes apparent if we express the conditional multistep variance in terms of smoothing parameters and one-step-ahead error variance. For example, for ETS(A,N,N) we have:

\begin{equation} \label{eq:hstepsMSEVariance}
\mathrm{MSE}_h \propto \hat{\sigma}_1^2 \left(1 +(h-1) \hat{\alpha} \right),
\end{equation}
where \( \hat{\alpha} \) is the smoothing parameter and \(\hat{\sigma}_1^2 \) is the one-step-ahead error variance. From the formula \eqref{eq:hstepsMSEVariance}, it becomes apparent that when we minimise MSE\(_h\), the estimated variance and the smoothing parameters will be minimised as well. This is how the shrinkage effect appears: we force \( \hat{\alpha} \) to become as close to zero as possible, and the strength of shrinkage is regulated by the forecasting horizon \( h \).

In the paper itself, we discuss this effect for several multistep estimators (the specific effect would be different between them) and several ETS and ARIMA models. While for ETS, it is easy to show how shrinkage works, for ARIMA, the situation is more complicated because the direction of shrinkage would change with the ARIMA orders. Still, what can be said clearly for any dynamic model is that the multistep estimators make them less stochastic and more conservative.

Multistep forecast errors are always correlated

This is a small finding, done in bypassing. It means that, for example, the forecast error two steps ahead is always correlated with the three steps ahead one. This does not depend on the autocorrelation of residuals or any violation of assumptions of the model but rather only on whether the parameters of the model are zero or not. This effect arises from the model rather than from the data. The only situation when the forecast errors will not be correlated is when the model is deterministic (e.g. linear trend). This has important practical implications because some forecasting techniques make explicit and unrealistic assumptions that these correlations are zero, which would impact the final forecasts.

Predictive likelihoods aligning with the multistep estimators

We showed that if a model assumes the Normal distribution, in the case of MSEh and MSCE (Mean Squared Cumulative Error), the distribution of the future values follows Normal as well. This means that there are predictive likelihood functions for these models, the maximum of which is achieved with the same set of parameters as the minimum of the multistep estimators. This has two implications:

These multistep estimators should be consistent and efficient, especially when the smoothing parameters are close to zero;
The predictive likelihoods can be used in the model selection via information criteria.

The first point also explains the contradiction in the literature: if the smoothing parameter in the population is close to zero, then the multistep estimators will give more efficient estimates than the conventional estimators; in the other case, it might be less efficient. We have not used the second point above, but it would be useful when the best model needs to be selected for the data, and an analyst wants to use information criteria. This is one of the potential ways for future research.

How General Predictive Likelihood (GPL) encompasses all popular multistep estimators

GPL arises when the joint distribution of 1 to h steps ahead forecast errors is considered. It will be Multivariate Normal if the model assumes normality. In the paper, we showed that the maximum of GPL coincides with the minimum of the so-called “Generalised Variance” – the determinant of the covariance matrix of forecast errors. This minimisation reduces variances for all the forecast errors (from 1 to h) and increases the covariances between them, making the multistep forecast errors look more similar. In the perfect case, when the model is correctly specified (no omitted or redundant variables, homoscedastic residuals etc), the maximum of GPL will coincide with the maximum of the conventional likelihood of the Normal distribution (see Section 11.1 of the ADAM monograph).

Accidentally, it can be shown that the existing estimators are just special cases of the GPL, but with some restrictions on the covariance matrix. I do not intend to show it here, the reader is encouraged to either read the paper or see the brief discussion in Subsection 11.3.5 of the ADAM monograph.

GTMSE – Geometric Trace Mean Squared Error

Finally, looking at the special cases of GPL, we have noticed that there is one which has not been discussed in the literature. We called it Geometric Trace Mean Squared Error (GTMSE) because of the logarithms in the formula:
\begin{equation} \label{eq:GTMSE}
\mathrm{GTMSE} = \sum_{j=1}^h \log \frac{1}{T-j} \sum_{t=1}^{T-j} e_{t+j|t}^2 .
\end{equation}
GTMSE imposes shrinkage on parameters similar to other estimators but does it more mildly because of the logarithms in the formula. In fact, what the logarithms do is make variances of all forecast errors similar to each other. As a result, when used, GTMSE does not focus on the larger variances as other methods do but minimises all of them simultaneously similarly.

Examples in R

The estimators discussed in the paper are all implemented in the functions of the smooth package in R, including adam(), es(), ssarima(), msarima() and ces(). In the example below, we will see how the shrinkage works for the ETS on the example of Box-Jenkins sales data (this is the example taken from ADAM, Subsection 11.3.7):

library(smooth)

adamETSAANBJ <- vector("list",6)
names(adamETSAANBJ) <- c("MSE","MSEh","TMSE","GTMSE","MSCE","GPL")
for(i in 1:length(adamETSAANBJ)){
    adamETSAANBJ[[i]] <- adam(BJsales, "AAN", h=10, holdout=TRUE,
                              loss=names(adamETSAANBJ)[i])
}

The ETS(A,A,N) model, applied to this data, has different estimates of smoothing parameters:

sapply(adamETSAANBJ,"[[","persistence") |>
	round(5)

          MSE MSEh TMSE   GTMSE MSCE GPL
alpha 1.00000    1    1 1.00000    1   1
beta  0.23915    0    0 0.14617    0   0

We can see how shrinkage shows itself in the case of the smoothing parameter \(\beta\), which is shrunk to zero by MSEh, TMSE, MSCE and GPL but left intact by MSE and shrunk a little bit in the case of GTMSE. These different estimates of parameters lead to different forecasting trajectories and prediction intervals, as can be shown visually:

par(mfcol=c(3,2), mar=c(2,2,4,1))
# Produce forecasts
lapply(adamETSAANBJ, forecast, h=10, interval="prediction") |>
# Plot forecasts
    lapply(function(x, ...) plot(x, ylim=c(200,280), main=x$model$loss))

This should result in the following plots:

ADAM ETS on Box-Jenkins data with several estimators

Analysing the figure, it looks like the shrinkage of the smoothing parameter \(\beta\) is useful for this time series: the forecasts from ETS(A,A,N) estimated using MSEh, TMSE, MSCE and GPL look closer to the actual values than the ones from MSE and GTMSE. To assess their performance more precisely, we can extract error measures from the models:

sapply(adamETSAANBJ,"[[","accuracy") |>
	round(5)[c("ME","MSE"),]

         MSE    MSEh    TMSE    GTMSE    MSCE     GPL
ME   3.22900 1.06479 1.05233  3.44962 1.04604 0.95515
MSE 14.41862 2.89067 2.85880 16.26344 2.84288 2.62394

Alternatively, we can calculate error measures based on the produced forecasts and the measures() function from the greybox package:

lapply(adamETSAANBJ, forecast, h=10) |>
    sapply(function(x, ...) measures(holdout=x$model$holdout,
                                     forecast=x$mean,
                                     actual=actuals(x$model)))

A thing to note about the multistep estimators is that they are slower than the conventional ones because they require producing 1 to \( h \) steps ahead forecasts from every observation in-sample. In the case of the smooth functions, the time elapsed can be extracted from the models in the following way:

sapply(adamETSAANBJ, "[[", "timeElapsed")

In summary, the multistep estimators are potentially useful in forecasting and can produce models with more accurate forecasts. This happens because they impose shrinkage on the estimates of parameters, making models less stochastic and more inert. But their performance depends on each specific situation and the available data, so I would not recommend using them universally.

Message Multi-step Estimators and Shrinkage Effect in Time Series Models first appeared on Open Forecasting.

smooth v3.2.0: what’s new?

Ivan Svetunkov — Mon, 30 Jan 2023 13:06:47 +0000

smooth package has reached version 3.2.0 and is now on CRAN. While the version change from 3.1.7 to 3.2.0 looks small, this has introduced several substantial changes and represents a first step in moving to the new C++ code in the core of the functions. In this short post, I will outline the main new features of smooth 3.2.0.

New engines for ETS, MSARIMA and SMA

The first and one of the most important changes is the new engine for the ETS (Error-Trend-Seasonal exponential smoothing model), MSARIMA (Multiple Seasonal ARIMA) and SMA (Simple Moving Average), implemented respectively in es(), msarima() and sma() functions. The new engine was developed for adam() and the three models above can be considered as special cases of it. You can read more about ETS in ADAM monograph, starting from Chapter 4; MSARIMA is discussed in Chapter 9, while SMA is briefly discussed in Subsection 3.3.3.

The es() function now implements the ETS close to the conventional one, assuming that the error term follows normal distribution. It still supports explanatory variables (discussed in Chapter 10 of ADAM monograph) and advanced estimators (Chapter 11), and it has the same syntax as the previous version of the function had, but now acts as a wrapper for adam(). This means that it is now faster, more accurate and requires less memory than it used to. msarima() being a wrapper of adam() as well, is now also faster and more accurate than it used to be. But in addition to that both functions now support the methods that were developed for adam(), including vcov(), confint(), summary(), rmultistep(), reapply(), plot() and others. So, now you can do more thorough analysis and improve the models using all these advanced instruments (see, for example, Chapter 14 of ADAM).

The main reason why I moved the functions to the new engine was to clean up the code and remove the old chunks that were developed when I only started learning C++. A side effect, as you see, is that the functions have now been improved in a variety of ways.

And to be on the safe side, the old versions of the functions are still available in smooth under the names es_old(), msarima_old() and sma_old(). They will be removed from the package if it ever reaches the v.4.0.0.

New methods for ADAM

There are two new methods for adam() that can be used in a variety of cases. The first one is simulate(), which will generate data based on the estimated ADAM, whatever the original model is (e.g. mixture of ETS, ARIMA and regression on the data with multiple frequencies). Here is how it can be used:

adam(BJsales, "AAdN") |>
     simulate() |>
     plot()

which will produce a plot similar to the following:

Simulated data based on adam() applied to Box-Jenkins sales data

This can be used for research, when a more controlled environment is needed. If you want to fine tune the parameters of ADAM before simulating the data, you can save the output in an object and amend its parameters. For example:

testModel <- adam(BJsales, "AAdN")
testModel$persistence <- c(0.5, 0.2)
simulate(testModel)

The second new method is the xtable() from the respective xtable package. It produces LaTeX version of the table from the summary of ADAM. Here is an example of a summary from ADAM ETS:

adam(BJsales, "AAdN") |>
     summary()

Model estimated using adam() function: ETS(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 256.1516
Coefficients:
      Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha   0.9514     0.1292     0.6960      1.0000 *
beta    0.3328     0.2040     0.0000      0.7358  
phi     0.8560     0.1671     0.5258      1.0000 *
level 203.2835     5.9968   191.4304    215.1289 *
trend  -2.6793     4.7705   -12.1084      6.7437  

Error standard deviation: 1.3623
Sample size: 150
Number of estimated parameters: 6
Number of degrees of freedom: 144
Information criteria:
     AIC     AICc      BIC     BICc 
524.3032 524.8907 542.3670 543.8387

As you can see in the output above, the function generates the confidence intervals for the parameters of the model, including the smoothing parameters, dampening parameter and the initial states. This summary can then be used to generate the LaTeX code for the main part of the table:

adam(BJsales, "AAdN") |>
     xtable()

which will looks something like this:

Summary of adam()

Other improvements

First, one of the major changes in smooth functions is the new backcasting mechanism for adam(), es() and msarima() (this is discussed in Section 11.4 of ADAM monograph). The main difference with the old one is that now it does not backcast the parameters for the explanatory variables and estimates them separately via optimisation. This feature appeared to be important for some of users who wanted to try MSARIMAX/ETSX (a model with explanatory variables) but wanted to use backcasting as the initialisation. These users then wanted to get a summary, analysing the uncertainty around the estimates of parameters for exogenous variables, but could not because the previous implementation would not estimate them explicitly. This is now available. Here is an example:

cbind(BJsales, BJsales.lead) |>
    adam(model="AAdN", initial="backcasting") |>
    summary()

Model estimated using adam() function: ETSX(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 255.1935
Coefficients:
             Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha          0.9724     0.1108     0.7534      1.0000 *
beta           0.2904     0.1368     0.0199      0.5607 *
phi            0.8798     0.0925     0.6970      1.0000 *
BJsales.lead   0.1662     0.2336    -0.2955      0.6276  

Error standard deviation: 1.3489
Sample size: 150
Number of estimated parameters: 5
Number of degrees of freedom: 145
Information criteria:
     AIC     AICc      BIC     BICc 
520.3870 520.8037 535.4402 536.4841

As you can see in the output above, the initial level and trend of the model are not reported, because they were estimated via backcasting. However, we get the value of the parameter BJsales.lead and the uncertainty around it. The old backcasting approach is now called "complete", implying that all values of the state vector are produce via backcasting.

Second, forecast.adam() now has a parameter scenarios, which when TRUE will return the simulated paths from the model. This only works when interval="simulated" and can be used for the analysis of possible forecast trajectories.

Third, the plot() method now can also produce ACF/PACF for the squared residuals for all smooth functions. This becomes useful if you suspect that your data has ARCH elements and want to see if they need to be modelled separately. This can also be done using adam() and sm() and is discussed in Chapter 17 of the monograph.

Finally, the sma() function now has the fast parameter, which when true will use a modified Ternary search for the best order based on information criteria. It might not give the global minimum, but it works much faster than the exhaustive search.

Conclusions

These are the main new features in the package. I feel that the main job in smooth is already done, and all I can do now is just tune the functions and improve the existing code. I want to move all the functions to the new engine and ditch the old one, but this requires much more time than I have. So, I don't expect to finish this any time soon, but I hope I'll get there someday. On the other hand, I'm not sure that spending much time on developing an R package is a wise idea, given that nowadays people tend to use Python. I would develop Python analogue of the smooth package, but currently I don't have the necessary expertise and time to do that. Besides, there already exist great libraries, such as tsforecast from nixtla and sktime. I am not sure that another library, implementing ETS and ARIMA is needed in Python. What do you think?

Message smooth v3.2.0: what’s new? first appeared on Open Forecasting.