Archives R - Open Forecasting

smooth v4.4.0

Ivan Svetunkov — Mon, 09 Feb 2026 09:02:21 +0000

Great news, everyone! smooth package for R version 4.4.0 is now on CRAN. Why is this a great news? Let me explain!

On this page:

What’s new?
Evaluation

Setup
Results

What’s next?

Here is what’s new since 4.3.0:

First, I have worked on tuning the initialisation in adam() in case of backcasting, and improved the msdecompose() function a bit to get more robust results. This was necessary to make sure that when the smoothing parameters are close to zero, initial values would still make sense. This is already in adam (use smoother="global" to test), but will become the default behaviour in the next version of the package, when we iron everything out. This is all a part of a larger work with Kandrika Pritularga on a paper about the initialisation of dynamic models.

Second, I have fixed a long standing issue of the eigenvalues calculation inside the dynamic models, which is applicable only in case of bounds="admissible" and might impact ARIMA, CES and GUM. The parameter restriction are now done consistently across all functions, guaranteeing that they will not fail and will produce stable/invertible estimates of parameters.

Third, I have added the Sparse ARMA function, which constructs ARMA(p,q) of the specific orders, dropping all the elements from 1 to those. e.g. SpARMA(2,3) would have the following form:
\begin{equation*}
y_t = \phi_2 y_{t-2} + \theta_3 \epsilon_{t-3} + \epsilon_{t}
\end{equation*}
This weird model is needed for a project I am working on together with Devon Barrow, Nikos Kourentzes and Yves Sagaert. I’ll explain more when we get the final draft of the paper.

And something very important, which you will not notice: I refactored the C++ code in the package so that it is available not only for R, but also for Python… Why? I’ll explain in the next post :). But this also means that the old functions that relied on the previous generation of the C++ code are now discontinued, and all the smooth functions use the new core. This applies to es(), ssarima(), msarima(), ces(), gum() and sma(). You will not notice any change, except that some of them should become a bit faster and probably more robust. And this also means that all of them will now be able to use methods for the adam() function. For example, the summary() will produce the proper output with standard errors and confidence intervals for all estimated parameters.

Evaluation

DISCLAIMER: The previous evaluation was for smooth v4.3.0, you can find it here. I have changed one of error measures (sCE to SAME), but the rest is the same, so the results are widely comparable between the versions.

The setup

As usual, in situations like this, I have run the evaluation on the M1, M3 and Tourism competition data. This time, I have added more flavours of the ETS model selection so that you can see how the models pool impacts the forecasting accuracy. Short description:

XXX – select between pure additive ETS models only;
ZZZ – select from the pool of all 30 models, but use branch-and-bound to kick out the less suitable models;
ZXZ – same as (2), but without the multiplicative trend models. This is used in the smooth functions by default;
FFF – select from the pool of all 30 models (exhaustive search);
SXS – the pool of models that is used by default in ets() from the forecast package in R.

I also tested three types of the ETS initialisation:

Back – initial="backcasting"
Opt – initial="optimal"
Two – initial="two-stage"

Backcasting is now the default method of initialisation, and does well in many cases, but I found that optimal initials (if done correctly) help in some difficult situations, as long a you have enough of computational time.

I used two error measures and computational time to check how functions work. The first error measure is called RMSSE (Root Mean Squared Scaled Error) from M5 competition, motivated by Athanasopoulos & Kourentzes (2023):

\begin{equation*}
\mathrm{RMSSE} = \frac{1}{\sqrt{\frac{1}{T-1} \sum_{t=1}^{T-1} \Delta_t^2}} \mathrm{RMSE},
\end{equation*}
where \(\mathrm{RMSE} = \sqrt{\frac{1}{h} \sum_{j=1}^h e^2_{t+j}}\) is the Root Mean Squared Error of the point forecasts, and \(\Delta_t\) is the first differences of the in-sample actual values.

The second measure does not have a standard name in the literature, but the idea of it is to the measure the bias of forecasts and to get rid of the sign to make sure that positively biased forecasts on some time series are not cancelled out by the negative ones on the other ones. I call this measure “Scaled Absolute Mean Error” (SAME):

\begin{equation*}
\mathrm{SAME} = \frac{1}{\frac{1}{T-1} \sum_{t=1}^{T-1} |\Delta_t|} \mathrm{AME},
\end{equation*}
where \(\mathrm{AME}= \left| \frac{1}{h} \sum_{j=1}^h e_{t+j} \right|\).

For both of these measures, the lower value is better than the higher one. As for the computational time, I have measured it for each model and each series, and this time I provided distribution of times to better see how methods perform.

Boring code in R

library(Mcomp)
library(Tcomp)
library(forecast)
library(smooth)

library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
        holdout <- as.vector(holdout);
        insample <- as.vector(insample);
	# RMSSE and SAME are defined in greybox v2.0.7
        return(c(RMSSE(holdout, object$mean, mean(diff(insample^2)),
                 SAME(holdout, object$mean, mean(abs(diff(insample)))),
                 object$timeElapsed))
}

datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)

# Method configuration list
# Each method specifies: fn (function name), pkg (package), model, initial,
methodsConfig <- list(
	# ETS and Auto ARIMA from the forecast package in R
	"ETS" = list(fn = "ets", pkg = "forecast", use_x_only = TRUE),
	"Auto ARIMA" = list(fn = "auto.arima", pkg = "forecast", use_x_only = TRUE),
	# ADAM with different initialisation schemes
	"ADAM ETS Back" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ADAM ETS Opt" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ADAM ETS Two" = list(fn = "adam", pkg = "smooth", model = "ZXZ", initial = "two"),
	# ES, which is a wrapper of ADAM. Should give very similar results to ADAM on regular data
	"ES Back" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "back"),
	"ES Opt" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "opt"),
	"ES Two" = list(fn = "es", pkg = "smooth", model = "ZXZ", initial = "two"),
	# Several flavours for model selection in ES
	"ES XXX" = list(fn = "es", pkg = "smooth", model = "XXX", initial = "back"),
	"ES ZZZ" = list(fn = "es", pkg = "smooth", model = "ZZZ", initial = "back"),
	"ES FFF" = list(fn = "es", pkg = "smooth", model = "FFF", initial = "back"),
	"ES SXS" = list(fn = "es", pkg = "smooth", model = "SXS", initial = "back"),
	# ARIMA implementations in smooth
	"MSARIMA" = list(fn = "auto.msarima", pkg = "smooth", initial = "back"),
	"SSARIMA" = list(fn = "auto.ssarima", pkg = "smooth", initial = "back"),
	# Complex Exponential Smoothing
	"CES" = list(fn = "auto.ces", pkg = "smooth", initial = "back"),
	# Generalised Univeriate Model (experimental)
	"GUM" = list(fn = "auto.gum", pkg = "smooth", initial = "back")
)

methodsNames <- names(methodsConfig)
methodsNumber <- length(methodsNames)

measuresNames <- c("RMSSE","SAME","Time")
measuresNumber <- length(measuresNames)

testResults <- array(NA, c(methodsNumber, datasetLength, measuresNumber),
                     dimnames = list(methodsNames, NULL, measuresNames))

# Unified loop over all methods
for(j in seq_along(methodsConfig)){
	cfg <- methodsConfig[[j]]
	cat("Running method:", methodsNames[j], "\n")

	result <- foreach(i = 1:datasetLength, .combine = "cbind",
	                  .packages = c("smooth", "forecast")) %dopar% {
		startTime <- Sys.time()

		# Build model call based on method type
		if(isTRUE(cfg$use_x_only)){
			# forecast package methods: ets, auto.arima
			test <- do.call(cfg$fn, list(datasets[[i]]$x))
		}else if(cfg$fn %in% c("adam", "es")) {
			# adam and es take dataset and model
			test <- do.call(cfg$fn, list(datasets[[i]], model=cfg$model, initial = cfg$initial))
		}else{
			# auto.msarima, auto.ssarima, auto.ces, auto.gum
			test <- do.call(cfg$fn, list(datasets[[i]], initial = cfg$initial))
		}

		# Build forecast call
		forecast_args <- list(test, h = datasets[[i]]$h)
		testForecast <- do.call(forecast, forecast_args)
		testForecast$timeElapsed <- Sys.time() - startTime

		return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x))
	}
	testResults[j,,] <- t(result)
}

Results

And here are the results for the smooth functions in v4.4.0 for R. First, we summarise the RMSSEs. I produce quartiles of distribution of RMSSE together with the mean.

cbind(t(apply(testResults[,,"RMSSE"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"RMSSE"],1,mean)) |> round(4)

                  0%    25%    50%    75%      100%   mean
ETS           0.0245 0.6772 1.1806 2.3765   51.6160 1.9697
Auto ARIMA    0.0246 0.6802 1.1790 2.3583   51.6160 1.9864
ADAM ETS Back 0.0183 0.6647 1.1620 2.3023   50.2585 1.9283
ADAM ETS Opt  0.0242 0.6714 1.1868 2.3623   51.6160 1.9432
ADAM ETS Two  0.0246 0.6690 1.1875 2.3374   51.6160 1.9480
ES Back       0.0183 0.6674 1.1647 2.3164   50.2585 1.9292
ES Opt        0.0242 0.6740 1.1858 2.3644   51.6160 1.9469
ES Two        0.0245 0.6717 1.1874 2.3463   51.6160 1.9538
ES XXX        0.0183 0.6777 1.1708 2.3062   50.2585 1.9613
ES ZZZ        0.0108 0.6682 1.1816 2.3611  201.4959 2.0841
ES FFF        0.0145 0.6795 1.2170 2.4575 5946.1858 3.3033
ES SXS        0.0183 0.6754 1.1709 2.3539   50.2585 1.9448
MSARIMA       0.0278 0.6988 1.1898 2.4208   51.6160 2.0750
SSARIMA       0.0277 0.7371 1.2544 2.4425   51.6160 2.0625
CES Back      0.0450 0.6761 1.1741 2.3205   51.0571 1.9650
GUM Back      0.0333 0.7077 1.2073 2.4533   51.6184 2.0461

The worst performing models are the ETS with the multiplicative trend (ES ZZZ and ES FFF). This is because there are outliers in some time series, and the multiplicative trend reacts to them by amending the trend value to something large (e.g. 2, i.e. twice increase in level for each step), and then can never return to a reasonable level (see explanation of this phenomenon in Section 6.6 of ADAM book). As expected, ADAM ETS does very similar to the ES, and we can see that the default initialisation (backcasting) is pretty good in terms of RMSSE values. To be fair, if the models are tested on a different dataset, it might be the case that the optimal initialisation would do better.

Here is a table with the SAME results:

cbind(t(apply(testResults[,,"SAME"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"SAME"],1,mean)) |> round(4)

                 0%    25%    50%    75%      100%   mean
ETS           8e-04 0.3757 1.0203 2.5097   54.6872 1.9983
Auto ARIMA    0e+00 0.3992 1.0429 2.4565   53.2710 2.0446
ADAM ETS Back 1e-04 0.3752 0.9965 2.4047   52.3418 1.9518
ADAM ETS Opt  5e-04 0.3733 1.0212 2.4848   55.1018 1.9618
ADAM ETS Two  8e-04 0.3780 1.0316 2.4511   55.1019 1.9712
ES Back       0e+00 0.3733 0.9945 2.4122   53.4504 1.9485
ES Opt        2e-04 0.3727 1.0255 2.4756   54.6860 1.9673
ES Two        1e-04 0.3855 1.0323 2.4535   54.6856 1.9799
ES XXX        1e-04 0.3733 1.0050 2.4257   53.1697 1.9927
ES ZZZ        3e-04 0.3824 1.0135 2.4885  229.7626 2.1376
ES FFF        3e-04 0.3972 1.0489 2.6042 3748.4268 2.9501
ES SXS        6e-04 0.3750 1.0125 2.4627   53.4504 1.9725
MSARIMA       1e-04 0.3960 1.0094 2.5409   54.7916 2.1227
SSARIMA       1e-04 0.4401 1.1222 2.5673   52.5023 2.1248
CES Back      6e-04 0.3767 1.0079 2.4085   54.9026 2.0052
GUM Back      0e+00 0.3803 1.0575 2.6259   63.0637 2.0858

In terms of bias, smooth implementations of ETS are doing well again, and we can see the same issue with the multiplicative trend here as before. Another thing to note is that MSARIMA and SSARIMA are not as good as the Auto ARIMA from the forecast package on these datasets in terms of RMSSE and SAME (at least, in terms of mean error measures). And actually, GUM and CES are now better than those in terms of both error measures.

Finally, here is a table with the computational time:

cbind(t(apply(testResults[,,"Time"],1,quantile, na.rm=T)),
      mean=apply(testResults[,,"Time"],1,mean)) |> round(4)

                  0%    25%    50%     75%    100%   mean
ETS           0.0032 0.0117 0.1660  0.6728  1.6400 0.3631
Auto ARIMA    0.0100 0.1184 0.3618  1.0548 54.3652 1.4760
ADAM ETS Back 0.0162 0.1062 0.1854  0.4022  2.5109 0.2950
ADAM ETS Opt  0.0319 0.1920 0.3103  0.6792  3.8933 0.5368
ADAM ETS Two  0.0427 0.2548 0.4035  0.8567  3.7178 0.6331
ES Back       0.0153 0.0896 0.1521  0.3335  2.1128 0.2476
ES Opt        0.0303 0.1667 0.2565  0.5910  3.5887 0.4522
ES Two        0.0483 0.2561 0.4016  0.8626  3.5892 0.6309
MSARIMA Back  0.0614 0.3418 0.6947  0.9868  3.9677 0.7534
SSARIMA Back  0.0292 0.2963 0.8988  2.1729 13.7635 1.6581
CES Back      0.0146 0.0400 0.1834  0.2298  1.2099 0.1713
GUM Back      0.0165 0.2101 1.5221  3.0543  9.5380 1.9506

# Separate table for special pools of ETS.
# The time is proportional to the number of models here
=========================================================
                  0%    25%    50%     75%    100%   mean
ES XXX        0.0114 0.0539 0.0782  0.1110  0.8163 0.0859
ES ZZZ        0.0147 0.1371 0.2690  0.4947  2.2049 0.3780
ES FFF        0.0529 0.2775 1.1539  1.5926  3.8552 1.1231
ES SXS        0.0323 0.1303 0.4491  0.6013  2.2170 0.4581

I have manually moved the specific ES model pools flavours below because there is no point in comparing their computational time with the time of the others (they have different pools of models and thus are not really comparable with the rest).

What we can see from this, is that the ES with backcasting is faster in comparison with the other models in this setting (in terms of mean and median computational time). CES is very fast in terms of mean computational time, which is probably because of the very short pool of models to choose from (only four). SSARIMA is pretty slow, which is due to the nature of its order selection algorithm (I don't plan to update it any time soon, but if someone wants to contribute - let me know). But the interesting thing is that Auto ARIMA, while being relatively fine in terms of median time, has the highest maximum one, meaning that for some time series, it failed for some unknown reason. The series that caused the biggest issue for Auto ARIMA is N389 from the M1 competition. I'm not sure what the issue was, and I don't have time to investigate this.

Mean computational time vs mean RMSSE

Comparing the mean computational time with mean RMSSE value (image above), it looks like the overall tendency in the smooth + forecast functions for the M1, M3 and Tourism datasets is that additional computational time does not improve the accuracy. But it also looks like a simpler pool of pure additive models (ETS(X,X,X)) harms the accuracy in comparison with the branch-and-bound based one of the default model="ZXZ". There seems to be a sweet spot in terms of the pool of models to choose from (no multiplicative trend, allow mixed models). This aligns well with the papers of Petropoulos et al. (2025), who investigated the accuracy of arbitrary short pools of models and Kourentzes et al. (2019), who showed how pooling (if done correctly) can improve the accuracy on average.

What's next?

For R, the main task now is to rewrite the oes() function and substitute it with the om() one - "Occurrence Model". This should be equivalent to adam() in functionality, allowing to introduce ETS, ARIMA and explanatory variables for the occurrence part of the model. This is a huge work, which I hope to progress slowly throughout the 2026 and finish by the end of the year. Doing that will also allow me removing the last bits of the old C++ code and switch to the ADAM core completely, introducing more functionality for capturing patterns on intermittent demand. The minor task, is to test the smoother="global" more for the ETS initialisation and roll it out as the default in the next release for both R and Python.

For Python,... What Python? Ah! You'll see soon :)

Message smooth v4.4.0 first appeared on Open Forecasting.

smooth v4.3.0 in R: what’s new and what’s next?

Ivan Svetunkov — Fri, 04 Jul 2025 10:02:17 +0000

Good news! The smooth package v4.3.0 is now on CRAN. And there are several things worth mentioning, so I have written this post.

New default initialisation mechanism

Since the beginning of the package, the smooth functions supported three ways for initialising the state vector (the vector that includes level, trend, seasonal indices): optimisation, backcasting and values provided by user. The former has been considered the standard way of estimating ETS, while the backcasting was originally proposed by Box & Jenkins (1970) and was only implemented in the smooth (at least, I haven’t seen it anywhere else). The main advantage of the latter is in computational time, because you do not need to estimate every single value of the state vector. The new ADAM core that I developed during COVID lockdown, had some improvements for the backcasting, and I noticed that adam() produced more accurate forecasts with it than with the optimisation. But I needed more testing, so I have not changed anything back then.

However, my recent work with Kandrika Pritularga on capturing uncertainty in ETS, have demonstrated that backcasting solves some fundamental problems with the variance of states – the optimisation cannot handle so many parameters, and asymptotic properties of ETS do not make sense in that case (we’ll release the paper as soon as we finish the experiments). So, with this evidence on hands and additional tests, I have made a decision to switch from the optimisation to backcasting as the default initialisation mechanism for all the smooth functions.

The final users should not feel much difference, but it should work faster now and (hopefully) more accurately. If this is not the case, please get in touch or file an issue on github.

Also, rest assured the initial="optimal" is available and will stay available as an option in all the smooth functions, so, you can always switch back to it if you don’t like backcasting.

Finally, I have introduce a new initialisation mechanism called “two-stage”, the idea of which is to apply backcasting first and then to optimise the obtained state values. It is slower, but is supposed to be better than the standard optimisation.

ADAM core

Every single function in the smooth package now uses ADAM C++ core, and the old core will be discontinued starting from v4.5.0 of the package. This applies to the functions: es(), ssarima(), msarima(), ces(), gum(), sma(). There are now the legacy versions of these functions in the package with the prefix “_old” (e.g. es_old()), which will be removed in the smooth v4.5.0. The new engine also helped ssarima(), which now became slightly more accurate than before. Unfortunately, there are still some issues with the initialisation of the seasonal ssarima(), which I have failed to solve completely. But I hope that over time this will be resolved as well.

smooth performance update

I have applied all the smooth functions together with the ets() and auto.arima() from the forecast package to the M1, M3 and Tourism competition data and have measured their performances in terms of RMSSE, scaled Cumulative Error (sCE) and computational time. I used the following R code for that:

Long and boring code in R

library(Mcomp)
library(Tcomp)

library(forecast)
library(smooth)

# I work on Linux and use doMC. Substitute this with doParallel if you use Windows
library(doMC)
registerDoMC(detectCores())

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
	holdout <- as.vector(holdout);
	insample <- as.vector(insample);
	return(c(measures(holdout, object$mean, insample),
			 mean(holdout < object$upper & holdout > object$lower),
			 mean(object$upper-object$lower)/mean(insample),
			 pinball(holdout, object$upper, 0.975)/mean(insample),
			 pinball(holdout, object$lower, 0.025)/mean(insample),
			 sMIS(holdout, object$lower, object$upper, mean(insample),0.95),
			 object$timeElapsed))
}

# Datasets to use
datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)
# Types of models to try
methodsNames <- c("ETS", "Auto ARIMA",
				  "ADAM ETS Back", "ADAM ETS Opt", "ADAM ETS Two",
				  "ES Back", "ES Opt", "ES Two",
				  "ADAM ARIMA Back", "ADAM ARIMA Opt", "ADAM ARIMA Two",
				  "MSARIMA Back", "MSARIMA Opt", "MSARIMA Two",
				  "SSARIMA Back", "SSARIMA Opt", "SSARIMA Two",
				  "CES Back", "CES Opt", "CES Two",
				  "GUM Back", "GUM Opt", "GUM Two");
methodsNumber <- length(methodsNames);
test <- adam(datasets[[125]]);

testResults20250603 <- array(NA,c(methodsNumber,datasetLength,length(test$accuracy)+6),
                             dimnames=list(methodsNames, NULL,
                                           c(names(test$accuracy),
                                             "Coverage","Range",
                                             "pinballUpper","pinballLower","sMIS",
                                             "Time")));

#### ETS from forecast package ####
j <- 1;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
  startTime <- Sys.time()
  test <- ets(datasets[[i]]$x);
  testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### AUTOARIMA ####
j <- 2;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.arima(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Backcasting ####
j <- 3;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Optimal ####
j <- 4;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ETS Two-stage ####
j <- 5;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- adam(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Backcasting ####
j <- 6;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Optimal ####
j <- 7;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ES Two-stage ####
j <- 8;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- es(datasets[[i]],"ZXZ", initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Backcasting ####
j <- 9;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="back", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Optimal ####
j <- 10;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="opt", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### ADAM ARIMA Two-stage ####
j <- 11;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.adam(datasets[[i]], "NNN", initial="two", distribution=c("dnorm"));
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Backcasting ####
j <- 12;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Optimal ####
j <- 13;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### MSARIMA Two-stage ####
j <- 14;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.msarima(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Backcasting ####
j <- 15;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ssarima(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Optimal ####
j <- 16;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="opt");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### SSARIMA Two-stage ####
j <- 17;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]], initial="two");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Backcasting ####
j <- 18;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Optimal ####
j <- 19;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### CES Two-stage ####
j <- 20;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.ces(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Backcasting ####
j <- 21;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="back");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Optimal ####
j <- 22;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="opt");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

#### GUM Two-stage ####
j <- 23;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
  startTime <- Sys.time()
  test <- auto.gum(datasets[[i]], initial="two");
  testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
  testForecast$timeElapsed <- Sys.time() - startTime;
  return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults20250603[j,,] <- t(result);

# Summary of results
cbind(t(apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,quantile)),
mean=apply(testResults20250603[c(1:8,12:23),,"RMSSE"],1,mean),
sCE=apply(testResults20250603[c(1:8,12:23),,"sCE"],1,mean),
Time=apply(testResults20250603[c(1:8,12:23),,"Time"],1,mean)) |> round(3)

The table below shows the distribution of RMSSE, the mean sCE and mean Time. The boldface shows the best performing model.

                    min   Q1  median   Q3     max  mean    sCE  Time
ETS                0.024 0.677 1.181 2.376  51.616 1.970  0.299 0.385
Auto ARIMA         0.025 0.680 1.179 2.358  51.616 1.986  0.124 1.467

ADAM ETS Back      0.015 0.666 1.175 2.276  51.616 1.921  0.470 0.218
ADAM ETS Opt       0.020 0.666 1.190 2.311  51.616 1.937  0.299 0.432
ADAM ETS Two       0.025 0.666 1.179 2.330  51.616 1.951  0.330 0.579

ES Back            0.015 0.672 1.174 2.284  51.616 1.921  0.464 0.219
ES Opt             0.020 0.672 1.186 2.316  51.616 1.943  0.302 0.497
ES Two             0.024 0.668 1.181 2.346  51.616 1.952  0.346 0.562

MSARIMA Back       0.025 0.710 1.188 2.383  51.616 2.028  0.067 0.780
MSARIMA Opt        0.025 0.724 1.242 2.489  51.616 2.083  0.088 1.905
MSARIMA Two        0.025 0.718 1.250 2.485  51.906 2.075  0.083 2.431

SSARIMA Back       0.045 0.738 1.248 2.383  51.616 2.063  0.167 1.747
SSARIMA Opt        0.025 0.774 1.292 2.413  51.616 2.040  0.178 7.324
SSARIMA Two        0.025 0.742 1.241 2.414  51.616 2.027  0.183 8.096

CES Back           0.046 0.695 1.189 2.355  51.342 1.981  0.125 0.185
CES Opt            0.030 0.698 1.218 2.327  49.480 2.001 -0.135 0.834
CES Two            0.025 0.696 1.207 2.343  51.242 1.993 -0.078 1.006

GUM Back           0.046 0.707 1.215 2.399  51.134 2.049 -0.285 3.575
GUM Opt            0.026 0.795 1.381 2.717 240.143 2.932 -0.549 4.668
GUM Two            0.026 0.803 1.406 2.826 240.143 3.041 -0.593 4.703

Several notes:

ES is a wrapper of ADAM ETS. The main difference between them is that the latter uses the Gamma distribution for the multiplicative error models, while the former relies on the Normal one.
MSARIMA is a wrapper for ADAM ARIMA, which is why I don't report the latter in the results.

One thing you can notice from the output above, is that the models with backcasting consistently produce more accurate forecasts across all measures. I explain this with the idea that they tend not to overfit the data as much as the optimal initialisation does.

To see the stochastic dominance of the forecasting models, I conducted the modification of the MCB/Nemenyi test, explained in this post:

par(mar=c(10,3,4,1))
greybox::rmcb(t(testResults20250603[c(1:8,12:23),,"RMSSE"]), outplot="mcb")

Nemenyi test for the smooth functions

The image shows mean ranks for each of the models and whether the performance of those is significant on the 5% level or not. It is apparent that ADAM ETS has the lowest rank, no matter what the initialisation is used, but its performance does not differ significantly from the es(), ets() and auto.arima(). Also, auto.arima() significantly outperforms msarima() and ssarima() on this data, which could be due to their initialisation. Still, backcasting seems to help all the functions in terms of accuracy in comparison with the "optimal" and "two-stage" initials.

What's next?

I am now working on a modified formulation for ETS, which should fix some issues with the multiplicative trend and make the ETS safer. This is based on Section 6.6 of the online version of the ADAM monograph (it is not in the printed version). I am not sure whether this will improve the accuracy further, but I hope that it will make some of the ETS models more resilient than they are right now. I specifically need the multiplicative trend model, which sometimes behave like crazy due to its formulation.

I also plan to translate all the simulation functions to the ADAM core. This applies to sim.es(), sim.ssarima(), sim.gum() and sim.ces(). Currently they rely on the older one, and I want to get rid of it. Having said that, the method simulate() applied to the new smooth functions already uses the new core. It just lacks the flexibility that the other functions have.

Furthermore, I want to rewrite the oes() function and substitute it with oadam(), which would use a better engine, supporting more features, such as multiple frequencies and ARIMA for the occurrence. This is a lot of work, and I probably will need help with that.

Finally, Filotas Theodosiou, Leonidas Tsaprounis, and I are working on the translation of the R code of the smooth to Python. You can read a bit more about this project here. There are several other people who decided to help us, but the progress so far has been a bit slow, because of the code translation. If you want to help, please get in touch.

Message smooth v4.3.0 in R: what’s new and what’s next? first appeared on Open Forecasting.

Methods for the smooth functions in R

Ivan Svetunkov — Thu, 10 Oct 2024 13:46:22 +0000

I have been asked recently by a colleague of mine how to extract the variance from a model estimated using adam() function from the smooth package in R. The problem was that that person started reading the source code of the forecast.adam() and got lost between the lines (this happens to me as well sometimes). Well, there is an easier solution, and in this post I want to summarise several methods that I have implemented in the smooth package for forecasting functions. In this post I will focus on the adam() function, although all of them work for es() and msarima() as well, and some of them work for other functions (at least as for now, for smooth v4.1.0). Also, some of them are mentioned in the Cheat sheet for adam() function of my monograph (available online).

The main methods

The adam class supports several methods that are used in other packages in R (for example, for the lm class). Here are they:

forecast() and predict() – produce forecasts from the model. The former is preferred, the latter has a bit of limited functionality. See documentation to see what forecasts can be generated. This was also discussed in Chapter 18 of my monograph.
fitted() – extracts the fitted values from the estimated object;
residuals() – extracts the residuals of the model. These are values of \(e_t\), which differ depending on the error type of the model (see discussion here);
rstandard() – returns standardised residuals, i.e. residuals divided by their standard deviation;
rstudent() – studentised residuals, i.e. residuals that are divided by their standard deviation, dropping the impact of each specific observation on it. This helps in case of influential outliers.

An additional method was introduced in the greybox package, called actuals(), which allows extracting the actual values of the response variable. Another useful method is accuracy(), which returns a set of error measures using the measures() function of the greybox package for the provided model and the holdout values.

All the methods above can be used for model diagnostics and for forecasting (the main purpose of the package). Furthermore, the adam class supports several functions for working with coefficients of models, similar to how it is done in case of lm:

coef() or coefficient() – extracts all the estimated coefficients in the model;
vcov() – extracts the covariance matrix of parameters. This can be done either using Fisher Information or via a bootstrap (bootstrap=TRUE). In the latter case, the coefbootstrap() method is used to create bootstrapped time series, reapply the model and extract estimates of parameters;
confint() – returns the confidence intervals for the estimated parameter. Relies on vcov() and the assumption of normality (CLT);
summary() – returns the output of the model, containing the table with estimated parameters, their standard errors and confidence intervals.

Here is an example of an output from an ADAM ETS estimated using adam():

adamETSBJ <- adam(BJsales, h=10, holdout=TRUE)
summary(adamETSBJ, level=0.99)

The first line above estimates and selects the most appropriate ETS for the data, while the second one will create a summary with 99% confidence intervals, which should look like this:

Model estimated using adam() function: ETS(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 241.1634
Coefficients:
      Estimate Std. Error Lower 0.5% Upper 99.5%  
alpha   0.8251     0.1975     0.3089      1.0000 *
beta    0.4780     0.3979     0.0000      0.8251  
phi     0.7823     0.2388     0.1584      1.0000 *
level 199.9314     3.6753   190.3279    209.5236 *
trend   0.2178     2.8416    -7.2073      7.6340  

Error standard deviation: 1.3848
Sample size: 140
Number of estimated parameters: 6
Number of degrees of freedom: 134
Information criteria:
     AIC     AICc      BIC     BICc 
494.3268 494.9584 511.9767 513.5372

How to read this output, is discussed in Section 16.3.

Multistep forecast errors

There are two methods that can be used as additional analytical tools for the estimated model. Their generics are implemented in the smooth package itself:

rmultistep() - extracts the multiple steps ahead in-sample forecast errors for the specified horizon. This means that the model produces the forecast of length h for every observation starting from the very first one, till the last one and then calculates forecast errors based on it. This is used in case of semiparametric and nonparametric prediction intervals, but can also be used for diagnostics (see, for example, Subsection 14.7.3);
multicov() - returns the covariance matrix of the h steps ahead forecast error. The diagonal of this matrix corresponds to the h steps ahead variance conditional on the in-sample information.

For the same model that we used in the previous section, we can extract and plot the multistep errors:

rmultistep(adamETSBJ, h=10) |> boxplot()
abline(h=0, col="red2", lwd=2)

which will result in:

Distributions of the multistep forecast errors

The image above shows that the model tend to under shoot the actual values in-sample (because the boxplots tend to lie slightly above the zero line). This might cause a bias in the final forecasts.

The covariance matrix of the multistep forecast error looks like this in our case:

multicov(adamETSBJ, h=10) |> round(3)

       h1    h2     h3     h4     h5     h6     h7     h8     h9    h10
h1  1.918 2.299  2.860  3.299  3.643  3.911  4.121  4.286  4.414  4.515
h2  2.299 4.675  5.729  6.817  7.667  8.333  8.853  9.260  9.579  9.828
h3  2.860 5.729  8.942 10.651 12.250 13.501 14.480 15.246 15.845 16.314
h4  3.299 6.817 10.651 14.618 16.918 18.979 20.592 21.854 22.841 23.613
h5  3.643 7.667 12.250 16.918 21.538 24.348 26.808 28.733 30.239 31.417
h6  3.911 8.333 13.501 18.979 24.348 29.515 32.753 35.549 37.737 39.448
h7  4.121 8.853 14.480 20.592 26.808 32.753 38.372 41.964 45.036 47.440
h8  4.286 9.260 15.246 21.854 28.733 35.549 41.964 47.950 51.830 55.127
h9  4.414 9.579 15.845 22.841 30.239 37.737 45.036 51.830 58.112 62.223
h10 4.515 9.828 16.314 23.613 31.417 39.448 47.440 55.127 62.223 68.742

This is not useful on its own, but can be used for some further derivations.

Note that the returned values by both rmultistep() and multicov() depend on the model's error type (see Section 11.2 for clarification).

Model diagnostics

The conventional plot() method applied to a model estimated using adam() can produce a variety of images for the visual model diagnostics. This is controlled by the which parameter (overall, 16 options). The documentation of the plot.smooth() contains the exhaustive list of options and Chapter 14 of the monograph shows how they can be used for model diagnostics. Here I only list several main ones:

plot(ourModel, which=1) - actuals vs fitted values. Can be used for general diagnostics of the model. Ideally, all points should lie around the diagonal line;
plot(ourModel, which=2) - standardised residuals vs fitted values. Useful for detecting potential outliers. Also accepts the level parameter, which regulates the width of the confidence bounds.
plot(ourModel, which=4) - absolute residuals vs fitted, which can be used for detecting heteroscedasticity of the residuals;
plot(ourModel, which=6) - QQ plot for the analysis of the distribution of the residuals. The specific figure changes for different distribution assumed in the model (see Section 11.1 for the supported ones);
plot(ourModel, which=7) - actuals, fitted values and point forecasts over time. Useful for understanding how the model fits the data and what point forecast it produces;
plot(ourModel, which=c(10,11)) - ACF and PACF of the residuals of the model to detect potentially missing AR/MA elements;
plot(ourModel, which=12) - plot of the components of the model. In case of ETS, will show the time series decomposition based on it.

And here are four default plots for the model that we estimated earlier:

par(mfcol=c(2,2))
plot(adamETSBJ)

Diagnostic plots for the estimated model

Based on the plot above, we can conclude that the model fits the data fine, does not have apparent heteroscedasticity, but has several potential outliers, which can be explored to improve it. The outliers detection is done via the outlierdummy() method, the generic of which is implemented in the greybox package.

Other useful methods

There are many methods that are used by functions to extract some information about the model. I sometimes use them to simplify my coding routine. Here they are:

lags() - returns lags of the model. Especially useful if you fit a multiple seasonal model;
orders() - the vector of orders of the model. Mainly useful in case of ARIMA, which can have multiple seasonalities and p,d,q,P,D,Q orders;
modelType() - the type of the model. In case with the one fitted above will return "AAdN". Can be useful to easily refit the similar model on the new data;
modelName() - the name of the model. In case of the one we fitted above will return "ETS(AAdN)";
nobs(), nparam(), nvariate() - number of in-sample observations, number of all estimated parameters and number of time series used in the model respectively. The latter one is developed mainly for the multivariate models, such as VAR and VETS (e.g. legion package in R);
logLik() - extracts log-Likelihood of the model;
AIC(), AICc(), BIC(), BICc() - extract respective information criteria;
sigma() - returns the standard error of the residuals.

More specialised methods

One of the methods that can be useful for scenarios and artificial data generation is simulate(). It will take the structure and parameters of the estimated model and use them to generate time series, similar to the original one. This is discussed in Section 16.1 of the ADAM monograph.

Furthermore, smooth implements the scale model, discussed in Chapter 17, which allows modelling time varying scale of distribution. This is done via the sm() method (generic introduced in the greybox package), the output of which can then be merged with the original model via the implant() method.

For the same model that we used earlier, the scale model can be estimated this way:

adamETSBJSM <- sm(adamETSBJ)

This is how it looks:

plot(adamETSBJSM, 7)

Scale model for the ADAM ETS

In the plot above, the y-axis contains the squared residuals. The fact that the holdout sample contains a large increase in the error is expected, because that part corresponds to the forecast errors rather than residuals. It is added to the plot for completeness.

To use the scale model in forecasting, we should implant it in the location one, which can be done using the following command:

adamETSBJFull <- implant(location=adamETSBJ, scale=adamETSBJSM)

The resulting model will have fewer degrees of freedom (because the scale model estimated two parameters), but its prediction interval will now take the scale model into account, and will differ from the original. We will now take into account the time varying variance based on the more recent information instead of the averaged one across the whole time series. In our case, the forecasted variance is lower than the one we would obtain in case of the adamETSBJ model. This leads to the narrower prediction interval (you can produce them for both models and compare):

forecast(adamETSBJFull, h=10, interval="prediction") |> plot()

Forecast from the full ADAM, containing both location and scale parts

Conclusions

The methods discussed above give a bit of flexibility of how to model things and what tools to use. I hope this makes your life easier and that you won't need to spend time reading the source code, but instead can focus on forecasting and analytics with ADAM.

Message Methods for the smooth functions in R first appeared on Open Forecasting.

Complex-Valued Econometrics with Examples in R

Ivan Svetunkov — Sun, 04 Aug 2024 14:33:40 +0000

Back in 2022, my father asked me to help him in amending and editing a monograph he wrote on the topic of “Complex-Valued Econometrics”. The original book focused on dynamic models, but after looking through the material and a thorough discussion, we decided to write something more fundamental. The monograph is based on the research he has done over the years, working in Saint Petersburg. I developed an R package called “complex” to support the book and then expanded the text with some derivations and examples of application. The result was then submitted to Springer and is now finally published in their “Contributions to Economics” series. Unfortunately, due to the agreement with the publisher, we cannot make the book freely available, but some of related materials can be found on a github repo, here.

We will receive royalties from selling this book, and we have decided to direct them to a charity to help Ukrainians (this one).

And here is how the cover of the book looks like:

Complex-Valued Econometrics with Examples in R

Svetunkov S., Svetunkov I. (2024). Complex-Valued Econometrics with Examples in R: Modelling, Regression and Applications. Springer Cham. 154 pages. DOI: 10.1007/978-3-031-62608-1

Message Complex-Valued Econometrics with Examples in R first appeared on Open Forecasting.

Detecting patterns in white noise

Ivan Svetunkov — Wed, 10 Apr 2024 08:16:58 +0000

Back in 2015, when I was working on my paper on Complex Exponential Smoothing, I conducted a simple simulation experiment to check how ARIMA and ETS select components/orders in time series. And I found something interesting…

One of the important steps in forecasting with statistical models is identifying the existing structure. In the case of ETS, it comes to selecting trend/seasonal components, while for ARIMA, it’s about order selection. In R, several functions automatically handle this based on information criteria (Hyndman & Khandakar, 2006; Svetunkov & Boylan (2017); Chapter 15 of ADAM). I decided to investigate how this mechanism works.

I generated data from the Normal distribution with a fixed mean of 5000 and a standard deviation of 50. Then, I asked ETS and ARIMA (from the forecast package in R) to automatically select the appropriate model for each of 1000 time series. Here is the R code for this simple experiment:

Some R code

# Set random seed for reproducibility
set.seed(41, kind="L'Ecuyer-CMRG")
# Number of iterations
nsim <- 1000
# Number of observations
obsAll <- 120
# Generate data from N(5000, 50)
rnorm(nsim*obsAll, 5000, 50) |>
  matrix(obsAll, nsim) |>
  ts(frequency=12) -> x

# Load forecast package
library(forecast)
# Load doMC for parallel calculations
# doMC is only available on Linux and Max
# Use library(doParallel) on Windows
library(doMC)
registerDoMC(detectCores())

# A loop for ARIMA, recording the orders
matArima <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- auto.arima(x[,i])
    # The element number 5 is just m, period of seasonality
    return(c(testModel$arma[-5],(!is.na(testModel$coef["drift"]))*1))
}
rownames(matArima) <- c("AR","MA","SAR","SMA","I","SI","Drift")

# A loop for ETS, recording the model types
matEts <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- ets(x[,i], allow.multiplicative.trend=TRUE)
    return(testModel[13]$method)
}

The findings of this experiment are summarised using the following chunk of the R code:

R code for the analysis of the results

#### Auto ARIMA ####
# Non-seasonal ARIMA elements
mean(apply(matArima[c("AR","MA","I","Drift"),]!=0, 2, any))
# Seasonal ARIMA elements
mean(apply(matArima[c("SAR","SMA","SI"),]!=0, 2, any))

#### ETS ####
# Trend in ETS
mean(substr(matEts,7,7)!="N")
# Seasonality in ETS
mean(substr(matEts,nchar(matEts)-1,nchar(matEts)-1)!="N")

I summarised them in the following table:

	ARIMA	ETS
Non-seasonal elements	24.8%	2.3%
Seasonal elements	18.0%	0.2%
Any type of structure	37.9%	2.4%

So, ARIMA detected some structure (had non-zero orders) in almost 40% of all time series, even though the data was designed to have no structure (just white noise). It also captured non-seasonal orders in a quarter of the series and identified seasonality in 18% of them. ETS performed better (only 0.2% of seasonal models identified on the white noise), but still captured trends in 2.3% of cases.

Does this simple experiment suggest that ARIMA is a bad model and ETS is a good one? No, it does not. It simply demonstrates that ARIMA tends to overfit the data if allowed to select whatever it wants. How can we fix that?

My solution: restrict the pool of ARIMA models to check, preventing it from going crazy. My personal pool includes ARIMA(0,1,1), (1,1,2), (0,2,2), along with the seasonal orders of (0,1,1), (1,1,2), and (0,2,2), and combinations between them. This approach is motivated by the connection between ARIMA and ETS. Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included.

This algorithm can be written in the following simple function that uses msarima() function from the smooth package in R (note that the reason why this function is used is because all ARIMA models implemented in the function are directly comparable via information criteria):

R code for the compact ARIMA function

arimaCompact <- function(y, lags=c(1,frequency(y)), ic=c("AICc","AIC","BIC","BICc"), ...){

    # Start measuring the time of calculations
    startTime <- Sys.time();

    # If there are no lags for the basic components, correct this.
    if(sum(lags==1)==0){
        lags <- c(1,lags);
    }

    orderLength <- length(lags);
    ic <- match.arg(ic);
    IC <- switch(ic,
                 "AIC"=AIC,
                 "AICc"=AICc,
                 "BIC"=BIC,
                 "BICc"=BICc);

    # We consider the following list of models:
    # ARIMA(0,1,1), (1,1,2), (0,2,2),
    # ARIMA(0,0,0)+c, ARIMA(0,1,1)+c,
    # seasonal orders (0,1,1), (1,1,2), (0,2,2)
    # And all combinations between seasonal and non-seasonal parts
    # 
    # Encode all non-seasonal parts
    nNonSeasonal <- 5
    arimaNonSeasonal <- matrix(c(0,1,1,0, 1,1,2,0, 0,2,2,0, 0,0,0,1, 0,1,1,1), nNonSeasonal,4,
                               dimnames=list(NULL, c("ar","i","ma","const")), byrow=TRUE)
    # Encode all seasonal parts ()
    nSeasonal <- 4
    arimaSeasonal <- matrix(c(0,0,0, 0,1,1, 1,1,2, 0,2,2), nSeasonal,3,
                               dimnames=list(NULL, c("sar","si","sma")), byrow=TRUE)

    # Check all the models in the pool
    testModels <- vector("list", nSeasonal*nNonSeasonal);
    m <- 1;
    for(i in 1:nSeasonal){
        for(j in 1:nNonSeasonal){
            testModels[[m]] <- msarima(y, orders=list(ar=c(arimaNonSeasonal[j,1],arimaSeasonal[i,1]),
                                                      i=c(arimaNonSeasonal[j,2],arimaSeasonal[i,2]),
                                                      ma=c(arimaNonSeasonal[j,3],arimaSeasonal[i,3])),
                                       constant=arimaNonSeasonal[j,4]==1, lags=lags, ...);
            m[] <- m+1;
        }
    }

    # Find the best one
    m <- which.min(sapply(testModels, IC));
    # Amend computational time
    testModels[[m]]$timeElapsed <- Sys.time()-startTime;

    return(testModels[[m]]);
}

Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included. I have not added that part in the code above. Still, this algorithm brings some improvements:

R code for the application of compact ARIMA to the data

#### Load the smooth package
library(smooth)

# A loop for the compact ARIMA, recording the orders
matArimaCompact <- foreach(i=1:nsim, .packages=c("smooth")) %dopar% {
    testModel <- arimaCompact(x[,i])
    return(orders(testModel))
}

#### Auto MSARIMA from smooth ####
# Non-seasonal ARIMA elements
mean(sapply(sapply(matArimaCompact, "[[", "ar"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "i"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "ma"), function(x){x[1]!=0}))

# Seasonal ARIMA elements
mean(sapply(sapply(matArimaSmooth, "[[", "ar"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "i"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "ma"), function(x){length(x)==2 && (x[2]!=0)}))

In my case, it resulted in the following:

	ARIMA	ETS	Compact ARIMA
Non-seasonal elements	24.8%	2.3%	2.4%
Seasonal elements	18.0%	0.2%	0.0%
Any type of structure	37.9%	2.4%	2.4%

As we see, when we impose restrictions on order selection in ARIMA, it avoids fitting seasonal models to non-seasonal data. While it still makes minor mistakes in terms of non-seasonal structure, it's nothing compared to the conventional approach. What about accuracy? I don't know. I'll have to write another post on this :).

Note that the models were applied to samples of 120 observations, which is considered "small" in statistics, while in real life is sometimes a luxury to have...

Message Detecting patterns in white noise first appeared on Open Forecasting.

smooth & greybox under LGPLv2.1

Ivan Svetunkov — Tue, 19 Sep 2023 09:32:56 +0000

Good news, everyone! I’ve recently released major versions of my packages smooth and greybox, v4.0.0 and v2.0.0 respectively, on CRAN. Has something big happened? Yes and no. Let me explain.

Stickers of the greybox and smooth packages for R

Starting from these versions, the packages will be licensed under LGPLv2.1 instead of the very restrictive GPLv2. This does not change anything to the everyday users of the packages, but is a potential game changer to software developers and those who might want to modify the source code of the packages for commercial purposes. This is because any change of the code under GPLv2 implies that these changes need to be released and made available to everyone, while the LGPLv2.1 allows modifications without releasing the source code. At the same time, both licenses imply that the attribution to the author is necessary, so if someone modifies the code and uses it for their purposes, they still need to say that the original package was developed by this and that author (Ivan Svetunkov in this case). The reason I decided to change the license is that one of software vendors that I sometimes work with pointed out that they cannot touch anything under GPL because of the restrictions above. Moving to the LGPL will now allow them using my packages in their own developments. This applies to such functions as adam(), es(), msarima(), ces(), alm() and others. I don’t mind, as long as they say who developed the original thing.

What happens now? The versions of the smooth and greybox packages under GPLv2 are available on github here and here respectively, so if you are a radical open source adept, you can download those releases, install them and use them instead of the new versions. But from now on, I plan to support the packages under the LGPLv2.1 license.

Finally, a small teaser: colleagues of mine have agreed to help me in translating the R code into Python (actually, I am quite useless in this endeavor, they do everything), so at some point in future, we might see the smooth and greybox packages in Python. And they will also be licensed under LGPLv2.1.

Message smooth & greybox under LGPLv2.1 first appeared on Open Forecasting.

iETS: State space model for intermittent demand forecasting

Ivan Svetunkov — Fri, 08 Sep 2023 09:30:40 +0000

Authors: Ivan Svetunkov, John E. Boylan

Journal: International Journal of Production Economics

Abstract: Inventory decisions relating to items that are demanded intermittently are particularly challenging. Decisions relating to termination of sales of product often rely on point estimates of the mean demand, whereas replenishment decisions depend on quantiles from interval estimates. It is in this context that modelling intermittent demand becomes an important task. In previous research, this has been addressed by generalised linear models or integer-valued ARMA models, while the development of models in state space framework has had mixed success. In this paper, we propose a general state space model that takes intermittence of data into account, extending the taxonomy of single source of error state space models. We show that this model has a connection with conventional non-intermittent state space models used in inventory planning. Certain forms of it may be estimated by Croston’s and Teunter-Syntetos-Babai (TSB) forecasting methods. We discuss properties of the proposed models and show how a selection can be made between them in the proposed framework. We then conduct a simulation experiment, empirically evaluating the inventory implications.

DOI: 10.1016/j.ijpe.2023.109013.

Working paper.

About the paper

DISCLAIMER: The models in this paper are also discussed in detail in the ADAM monograph (Chapter 13) with some examples going beyond what is discussed in the paper (e.g. models with trends).

What is “intermittent demand”? It is the demand that happens at irregular frequency (i.e. at random). Note that according to this definition, intermittent demand does not need to be count – it is a wider term than that. For example, electricity demand can be intermittent, but it is definitely not count. The definition above means that we do not necessarily know when specifically we will sell our product. From the modelling point of view, it means that we need to take into account two elements of uncertainty instead of just one:

How much people will buy;
When they will buy.

(1) is familiar for many demand planners and data scientists: we do not know specifically how much our customers will buy in the future, but we can get an estimate of the expected demand (mean value via a point forecast) and an idea of the uncertainty around it (e.g. produce prediction intervals or estimate the demand distribution). (2) is less obvious: there may be some periods when nobody buys our product, and then periods when we sell some, followed by no sales again. In that case we can encode the no sales in those “dry” periods with zeroes, the periods with demand as ones, and end up with a time series like this (this idea was briefly discussed in this and this posts):

An example of the occurrence part of an intermittent demand

The plot above visualises the demand occurrence, with zeroes corresponding to the situation of “no demand” and ones corresponding to some demand. In general, it is is challenging to predict, when the “ones” will happen specifically, but in the case above, it seems that over time the frequency of demand increases, implying that maybe it becomes regular. In mathematical terms, we could phrase this as the probability of occurrence increases over time: at the end of series, we won’t necessarily sell product, but the chance of selling is much higher than in the beginning. The original time series looks like this:

An example of an intermittent demand

It shows that indeed there is an increase of the frequency of sales together with the amount sold, and that it seems that the product is becoming more popular, moving from the intermittent to the regular demand domain.

In general, forecasting intermittent demand is a challenging task, but there are many existing approaches that can be used in this case. However, they are all detached from the conventional ones that are used for regular demand (such as ETS or ARIMA). What people usually do in practice is first categorise the data into regular and intermittent and then apply specific approaches to it (e.g. ETS/ARIMA for the regular demand, and Croston‘s method or TSB for the intermittent one).

John Boylan and I developed a statistical model that unites the two worlds – you no longer need to decide whether the data is intermittent or not, you can just use one model in an automated fashion – it will take care of intermittence (if there is one). It relies fundamentally on the classical Croston’s equation:
\begin{equation} \label{eq:general}
y_t = o_t z_t ,
\end{equation}
where \(y_t\) is the observed value at time \(t\), \(o_t\) is the binary occurrence variable and \(z_t\) is the demand sizes variable. Trying to derive the statistical model underlying Croston’s method, Snyder (2002) and Shenstone & Hyndman (2005) used models based on \eqref{eq:general} but instead of plugging in a multiplicative ETS in \(z_t\) they got stuck with the idea of logarithmic transformation of demand sizes and/or using count distributions for the demand sizes. John and I looked into this equation again and decided that we can model both demand sizes and demand occurrence using a pair of pure multiplicative ETS models. In this post, I will focus on ETS(M,N,N) as the simplest model, but more complicated ones (with trend and/or explanatory variables) can be used as well without the loss in logic. So, for the demand sizes we will have:
\begin{equation}
\begin{aligned}
& z_t = l_{t-1} (1 + \epsilon_t) \\
& l_t = l_{t-1} (1 + \alpha \epsilon_t)
\end{aligned}
\label{eq:demandSizes}
\end{equation}
where \(l_t\) is the level of series, \(\alpha\) is the smoothing parameter and \(1 + \epsilon_t \) is the error term that follows some positive distribution (the options we considered in the paper are the Log-Normal, Gamma and Inverse Gaussian). The demand sizes part is relatively straightforward: you just apply the conventional pure multiplicative ETS model with a positive distribution (which makes \(z_t\) always positive) and that’s it. However, the occurrence part is more complicated.

Given that the occurrence variable is random, we should model the probability of occurrence. We proposed to assume that \(o_t \sim \mathrm{Bernoulli}(p_t) \) (logical assumption, done in many other papers), meaning that the probability of occurrence changes over time. In turn, the changing probability can be modelled using one of the several approaches that we proposed. For example, it can be modelled via the so called “inverse odds ratio” model with ETS(M,N,N), formulated as:
\begin{equation}
\begin{aligned}
& p_t = \frac{1}{1 + \mu_{b,t}} \\
& \mu_{b,t} = l_{b,t-1} \\
& l_{b,t} = l_{b,t-1} (1 + \alpha_b \epsilon_{b,t})
\end{aligned}
\label{eq:demandOccurrenceOdds}
\end{equation}
where \(\mu_{b,t}\) is the one step ahead expectation of the underlying model, \(l_{b,t}\) is the latent level, \(\alpha_b\) is the smoothing parameter of the model, and \(1+\epsilon_{b,t}\) is the positively distributed error term (with expectation equal to one and an unknown distribution, which we actually do not care about). The main feature of the inverse odds ratio occurrence model is that it should be effective in cases when demand is building up (moving from the intermittent to the regular pattern, without zeroes). In our paper we show how such model can be estimated and also show that Croston’s method can be used for the estimation of this model when the demand occurrence does not change (substantially) between the non-zero demands. So, this model can be considered as the model underlying Croston’s method.

Uniting the equations \eqref{eq:general}, \eqref{eq:demandSizes} and \eqref{eq:demandOccurrenceOdds}, we get the iETS(M,N,N)\(_\mathrm{I}\)(M,N,N) model, where the letters in the first brackets correspond to the demand sizes part, the subscript “I” tells us that we have the “inverse odds ratio” model for the occurrence, and the second brackets show what ETS model was used in the demand occurrence model. The paper explains in detail how this model can be built and estimated.

In the very same paper we discuss other potential models for demand occurrence (more suitable for demand obsolescence or fixed probability of occurrence) and, in fact, in my opinion this part is the main contribution of the paper – we have looked into something no one did before: how to model demand occurrence using ETS. Having so many options, we might need to decide which to use in an automated fashion. Luckily, given that these models are formulated in one and the same framework, we can use information criteria to select the most suitable one for the data. Furthermore, when all probabilities of occurrence are equal to one, the model \eqref{eq:general} together with \eqref{eq:demandSizes} transforms into the conventional ETS(M,N,N) model. This also means that the regular ETS model can be compared with the iETS directly using information criteria to decide whether the occurrence part is needed or not. So, we end up with a relatively simple framework that can be used for any type of demand without a need to do a categorisation.

As a small side note, we also showed in the paper that the estimates of smoothing parameters for the demand sizes in iETS will always be positively biased (being higher than needed). In fact, this bias appears in any intermittent demand model that assumes that the potential demand sizes change between the non-zero observations (reasonable assumption for any modelling approach). In a way, this finding also applies to both Croston’s and TSB methods and agrees with similar finding by Kourentzes (2014).

Example in R

All the models from the paper are implemented in the adam() function from the smooth package in R (with the oes() function taking care of the occurrence, see details here and here). For the demonstration purposes (and for fun), we will consider an artificial example of the demand obsolescence, modelled via the “Direct probability” iETS model (it underlies the TSB method):

set.seed(7)
c(rpois(10,3),rpois(10,2),rpois(10,1),rpois(10,0.5),rpois(10,0.1)) |>
    ts(frequency=12) -> y

My randomly generated time series looks like this:

Demand becoming obsolete

In practice, in the example above, we can be interested in deciding, whether to discontinue the product (to save money on stocking it) or not. To model and forecast the demand above, we can use the following code in R:

library(smooth)
iETSModel <- adam(y, "YYN", occurrence="direct", h=5, holdout=TRUE)

The "YYN" above tells function to select the best pure multiplicative ETS model based on the information criterion (AICc by default, see discussion in Section 15.1 of the ADAM monograph), the "occurrence" variable specifies, which of the demand occurrence models to build. By default, the function will use the same model for the demand probability as the selected for the demand sizes. So, for example, if we end up with ETS(M,M,N) for demand sizes, the function will use ETS(M,M,N) for the probability of occurrence. If you want to change this, you would need to use the oes() function and specify the model there (see examples in Section 13.4 of the ADAM monograph). Finally, I've asked function to produce 5 steps ahead forecasts and to keep the last 5 observations in the holdout sample. I ended up having the following model:

summary(iETSModel)

Model estimated using adam() function: iETS(MMN)
Response variable: y
Occurrence model type: Direct
Distribution used in the estimation: 
Mixture of Bernoulli and Gamma
Loss function type: likelihood; Loss function value: 71.0549
Coefficients:
      Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha   0.1049     0.0925     0.0000      0.2903  
beta    0.1049     0.0139     0.0767      0.1049 *
level   4.3722     1.1801     1.9789      6.7381 *
trend   0.9517     0.0582     0.8336      1.0685 *

Error standard deviation: 1.0548
Sample size: 45
Number of estimated parameters: 9
Number of degrees of freedom: 36
Information criteria:
     AIC     AICc      BIC     BICc 
202.6527 204.1911 218.9126 206.6142

As we see from the output above, the function has selected the iETS(M,M,N) model for the data. The line "Mixture of Bernoulli and Gamma" tells us that the Bernoulli distribution was used for the demand occurrence (this is the only option), while the Gamma distribution was used for the demand sizes (this is the default option, but you can change this via the distribution parameter). We can then produce forecasts from this model:

forecast(iETSModel, h=5, interval="prediction", side="upper") |>
    plot()

In the code above, I have asked the function to generate prediction intervals (by default, for the pure multiplicative models, the function uses simulations) and to produce only the upper bound of the interval. The latter is motivated by the idea that in the case of the intermittent demand, the lower bound is typically not useful for decision making: we know that the demand cannot be below zero, and our stocking decisions are typically made based on the specific quantiles (e.g. for the 95% confidence level). Here is the plot that I get after running the code above:

Point and interval forecasts for the demand becoming obsolete

While the last observation in the holdout was not included in the prediction interval, the dynamics captured by the model is correct. The question that we should ask ourselves in this example is: what decision can be made based on the model? If you want to decide whether to stock the product or not, you can look at the forecast of the probability of occurrence to see how it changes over time and decide, whether to discontinue the product:

forecast(iETSModel$occurrence, h=5) |> plot()

Forecast of the probability of occurrence for the demand becoming obsolete

In our case, the probability reaches roughly 0.2 over the next 5 months (i.e. we might sale once every 5 months). If we think that this is too low then we should discontinue the product. Otherwise, if we decide to continue selling the product, then it makes more sense to generate the desired quantile of the cumulative demand over the lead time. In case of the adam() function it can be done by adding cumulative=TRUE in the forecast() function:

forecast(iETSModel, h=5, interval="prediction", side="upper", cumulative=TRUE)

after which we get:

      Point forecast Upper bound (95%)
Oct 4      0.3055742          1.208207

From the decision point of view, if we deal with count demand, the value 1.208207 complicates things. Luckily, as we showed in our paper, we can round the value up to get something meaningful, preserving the properties of the model. This means, that based on the estimated model, we need to have two items in stock to satisfy the demand over the next 5 months with the confidence level of 95%.

Conclusions

This is just a demonstration of what can be done with the proposed iETS model, but there are many more things one can do. For example, this approach allows capturing multiplicative seasonality in data that has zeroes (as long as seasonal indices can be estimated somehow). John and I started thinking in this direction, and we even did some work together with Patricia Ramos (our colleague from the university of INESC TEC), but given the hard time that was given to our paper by the reviewers in IJF, we had to postpone this research. I also used the ideas explained in this post in the paper on ED forecasting (written together with Bahman and Jethro). In that paper, I have used a seasonal model with the "direct" occurrence part, which tool care of zeroes (not bothering with modelling them properly) and allowed me to apply a multiple seasonal multiplicative ETS model with explanatory variables. Anyway, the proposed approach is flexible enough to be used in variety of contexts, and I think it will have many applications in real life.

P.S.: Story of the paper

I've written a separate long post, explaining the revision process of the paper and how it got to the acceptance stage at the IJPE, but then I realised that it is too long and boring. Besides, John would not have approved of the post and would say that I am sharing the unnecessary details, creating potential exasperation for fellow forecasters who reviewed the paper. So, I have decided not to publish that post, and instead just to add a short subsection. Here it is.

We started working on the paper in March 2016 and submitted it to the International Journal of Forecasting (IJF) in January 2017. It went through four rounds of revision with the second reviewer throughout the way being very critical, unsupportive and driving the paper into a wrong direction, burying it in the discussion of petty statistical details. We rewrote the paper several times and I rewrote the R code of the function few times. In the end the Associate Editor (AE) of the IJF (who completely forgot about our paper for several months) decided not to send the paper to the reviewers again, completely ignored our responses to the reviewers, did not provide any major feedback and have written an insulting response that ended with the phrase "I could go on, but I’m out of patience with the authors and their paper". The paper was rejected from IJF in 2019, which set me back in my academic career. This together with constant rejections of my Complex Exponential Smoothing paper and actions of a colleague of mine who decided to cut all ties with me in Summer 2019, hit my self-esteem and caused a serious damage to my professional life. I thought of quitting academia and to either starting working in business or doing something different with my life, not related to forecasting at all. I stayed mainly because of all the support that John Boylan, Robert Fildes, Nikos Kourentzes and my wife Anna Sroginis provided me. I recovered from that hit only in 2022, when my Complex Exponential Smoothing paper got accepted and things finally started turning well. After that John and I have rewritten the paper again, split it into two: "iETS" and "Multiplicative ETS" (under revision in IMA Journal of Management Mathematics) and submitted the former to the International Journal of Production Economics, where after one round of revision it got accepted. Unfortunately, we never got to celebrate the success with John because he passed away.

The moral of this story is that publishing in academia can be very tough and unfair. Sometimes, you get a very negative feedback from the people you least expect to get it from. People that you respect and think very highly of might not understand what you are proposing and be very unsupportive. We actually knew who the reviewers and the AE of our IJF paper were - they are esteemed academics in the field of forecasting. And while I still think highly of their research and contributions to the field, the way the second reviewer and the AE handled the review has damaged my personal respect to them - I never expected them to be so narrow-minded...

Message iETS: State space model for intermittent demand forecasting first appeared on Open Forecasting.

Story of “Probabilistic forecasting of hourly emergency department arrivals”

Ivan Svetunkov — Wed, 10 May 2023 20:47:27 +0000

The paper

Back in 2020, when we were all siting in the COVID lockdown, I had a call with Bahman Rostami-Tabar to discuss one of our projects. He told me that he had an hourly data of an Emergency Department from a hospital in Wales, and suggested writing a paper for a healthcare audience to show them how forecasting can be done properly in this setting. I noted that we did not have experience in working with high frequency data, and it would be good to have someone with relevant expertise. I knew a guy who worked in energy forecasting, Jethro Browell (we are mates in the IIF UK Chapter), so we had a chat between the three of us and formed a team to figure out better ways for ED arrival demand forecasting.

We agreed that each one of us will try their own models. Bahman wanted to try TBATS, Prophet and models from the fasster package in R (spoiler: the latter ones produced very poor forecasts on our data, so we removed them from the paper). Jethro had a pool of GAMLSS models with different distributions, including Poisson and truncated Normal. He also tried a Gradient Boosting Machine (GBM). I decided to test ETS, Poisson Regression and ADAM. We agreed that we will measure performance of models not only in terms of point forecasts (using RMSE), but also in terms of quantiles (pinball and quantile bias) and computational time. It took us a year to do all the experiments and another one to find a journal that would not desk-reject our paper because the editor thought that it was not relevant (even though they have published similar papers in the past). It was rejected from Annals of Emergency Medicine, Emergency Medicine Journal, American Journal of Emergency Medicine and Journal of Medical Systems. In the end, we submitted to Health Systems, and after a short revision the paper got accepted. So, there is a happy end in this story.

In the paper itself, we found that overall, in terms of quantile bias (calibration of models), GAMLSS with truncated Normal distribution and ADAM performed better than the other approaches, with the former also doing well in terms of pinball loss and the latter doing well in terms of point forecasts (RMSE). Note that the count data models did worse than the continuous ones, although one would expect Poisson distribution to be appropriate for the ED arrivals.

I don’t want to explain the paper and its findings in detail in this post, but given my relation to ADAM, I have decided to briefly explain what I included in the model and how it was used. After all, this is the first paper that uses almost all the main features of ADAM and shows how powerful it can be if used correctly.

Using ADAM in Emergency Department arrivals forecasting

Disclaimer: The explanation provided here relies on the content of my monograph “Forecasting and Analytics with ADAM“. In the paper, I ended up creating a quite complicated model that allowed capturing complex demand dynamics. In order to fully understand what I am discussing in this post, you might need to refer to the monograph.

Emergency Department Arrivals. The plots were generated using seasplot() function from the tsutils package.

The figure above shows the data that we were dealing with together with several seasonal plots (generated using seasplot() function from the tsutils package). As we see, the data exhibits hour of day, day of week and week of year seasonalities, although some of them are not very well pronounced. The data does not seem to have a strong trend, although there is a slow increase of the level. Based on this, I decided to use ETS(M,N,M) as the basis for modelling. However, if we want to capture all three seasonal patterns then we need to fit a triple seasonal model, which requires too much computational time, because of the estimation of all the seasonal indices. So, I have decided to use a double-seasonal ETS(M,N,M) instead with hour of day and hour of week seasonalities and to include dummy variables for week of year seasonality. The alternative to week of year dummies would be hour of year seasonal component, which would then require estimating 8760 seasonal indices, potentially overfitting the data. I argue that the week of year dummy provides the sufficient flexibility and there is no need in capturing the detailed intra-yearly profile on a more granular level.

To make things more exciting, given that we deal with hourly data of a UK hospital, we had to deal with issues of daylight saving and leap year. I know that many of us hate the idea of daylight saving, because we have to change our lifestyles 2 times each year just because of an old 18th century tradition. But in addition to being bad for your health, this nasty thing messes things up for my models, because once a year we have 23 hours and in another time we have 25 hours in a day. Luckily, it is taken care of by adam() that shifts seasonal indices, when the time change happens. All you need to do for this mechanism to work is to provide an object with timestamps to the function (for example, zoo). As for the leap year, it becomes less important when we model week of year seasonality instead of the day of year or hour of year one.

Emergency Department Daily Arrivals

Furthermore, as it can be seen from the figure above, it is apparent that calendar events play a crucial role in ED arrivals. For example, the Emergency Department demand over Christmas is typically lower than average (the drops in Figure above), but right after the Christmas it tends to go up (with all the people who injured themselves during the festivities showing up in the hospital). So these events need to be taken into account in a form of additional dummy variables by a model together with their lags (the 24 hour lags of the original variables).

But that’s not all. If we want to fit a multiplicative seasonal model (which makes more sense than the additive one due to changing seasonal amplitude for different times of year), we need to do something with zeroes, which happen naturally in ED arrivals over night (see the first figure in this post with seasonal plots). They do not necessarily happen at the same time of day, but the probability of having no demand tends to increase at night. This meant that I needed to introduce the occurrence part of the model to take care of zeroes. I used a very basic occurrence model called “direct probability“, because it is more sensitive to changes in demand occurrence, making the model more responsive. I did not use a seasonal demand occurrence model (and I don’t remember why), which is one of the limitations of ADAM used in this study.

Finally, given that we are dealing with low volume data, a positive distribution needed to be used instead of the Normal one. I used Gamma distribution because it is better behaved than the Log Normal or the Inverse Gaussian, which tend to have much heavier tails. In the exploration of the data, I found that Gamma does better than the other two, probably because the ED arrivals have relatively slim tails.

So, the final ADAM included the following features:

ETS(M,N,M) as the basis;
Double seasonality;
Week of year dummy variables;
Dummy variables for calendar events with their lags;
“Direct probability” occurrence model;
Gamma distribution for the residuals of the model.

This model is summarised in equation (3) of the paper.

The model was initialised using backcasting, because otherwise we would need to estimate too many initial values for the state vector. The estimation itself was done using likelihood. In R, this corresponded to roughly the following lines of code:

library(smooth)
oesModel <- oes(y, "MNN", occurrence="direct", h=48)
adamModelFirst <- adam(ourData, "MNM", lags=c(24,24*7), formula=y~x+xLag24+weekOfYear,
                       h=48, initial="backcasting",
                       occurrence=oesModel, distribution="dgamma")

Where x was the categorical variable (factor in R) with all the main calendar events. However, even with backcasting, the estimation of such a big model took an hour and 25 minutes. Given that Bahman, Jethro and I have agreed to do rolling origin evaluation, I've decided to help the function in the estimation inside the loop, providing the initials to the optimiser based on the very first estimated model. As a result, each estimation of ADAM in the rolling origin took 1.5 minutes. The code in the loop was modified to:

adamParameters <- coef(adamModelFirst)
oesModel <- oes(y, "MNN", occurrence="direct", h=48)
adamModel <- adam(ourData, "MNM", lags=c(24,24*7), formula=y~x+xLag24+weekOfYear,
                  h=48, initial="backcasting",
                  occurrence=oesModel, distribution="dgamma",
                  B=adamParameters)

Finally, we generated mean and quantile forecasts for 48 hours ahead. I used semiparametric quantiles, because I expected violation of some of assumptions in the model (e.g. autocorrelated residuals). The respective R code is:

testForecast <- forecast(adamModel, newdata=newdata, h=48,
                         interval="semiparametric", level=c(1:19/20), side="upper")

Furthermore, given that the data is integer-valued (how many people visit the hospital each hour) and ADAM produces fractional quantiles (because of the Gamma distribution), I decided to see how it would perform if the quantiles were rounded up. This strategy is simple and might be sensible when a continuous model is used for forecasting on a count data (see discussion in the paper). However, after running the experiment, the ADAM with rounded up quantiles performed very similar to the conventional one, so we have decided not to include it in the paper.

In the end, as stated earlier in this post, we concluded that in our experiment, there were two well performing approaches: GAMLSS with Truncated Normal distribution (called "NOtr-2" in the paper) and ADAM in the form explained above. The popular TBATS, Prophet and Gradient Boosting Machine performed poorly compared to these two approaches. For the first two, this is because of the lack of explanatory variables and inappropriate distributional assumptions (normality). As for the GBM, this is probably due to the lack of dynamic element in it (e.g. changing level and seasonal components).

Concluding this post, as you can see, I managed to fit a decent model based on ADAM, which captured the main characteristics of the data. However, it took a bit of time to understand what features should be included, together with some experiments on the data. This case study shows that if you want to get a better model for your problem, you might need to dive in the problem and spend some time analysing what you have on hands, experimenting with different parameters of a model. ADAM provides the flexibility necessary for such experiments.

Message Story of “Probabilistic forecasting of hourly emergency department arrivals” first appeared on Open Forecasting.

smooth v3.2.0: what’s new?

Ivan Svetunkov — Mon, 30 Jan 2023 13:06:47 +0000

smooth package has reached version 3.2.0 and is now on CRAN. While the version change from 3.1.7 to 3.2.0 looks small, this has introduced several substantial changes and represents a first step in moving to the new C++ code in the core of the functions. In this short post, I will outline the main new features of smooth 3.2.0.

New engines for ETS, MSARIMA and SMA

The first and one of the most important changes is the new engine for the ETS (Error-Trend-Seasonal exponential smoothing model), MSARIMA (Multiple Seasonal ARIMA) and SMA (Simple Moving Average), implemented respectively in es(), msarima() and sma() functions. The new engine was developed for adam() and the three models above can be considered as special cases of it. You can read more about ETS in ADAM monograph, starting from Chapter 4; MSARIMA is discussed in Chapter 9, while SMA is briefly discussed in Subsection 3.3.3.

The es() function now implements the ETS close to the conventional one, assuming that the error term follows normal distribution. It still supports explanatory variables (discussed in Chapter 10 of ADAM monograph) and advanced estimators (Chapter 11), and it has the same syntax as the previous version of the function had, but now acts as a wrapper for adam(). This means that it is now faster, more accurate and requires less memory than it used to. msarima() being a wrapper of adam() as well, is now also faster and more accurate than it used to be. But in addition to that both functions now support the methods that were developed for adam(), including vcov(), confint(), summary(), rmultistep(), reapply(), plot() and others. So, now you can do more thorough analysis and improve the models using all these advanced instruments (see, for example, Chapter 14 of ADAM).

The main reason why I moved the functions to the new engine was to clean up the code and remove the old chunks that were developed when I only started learning C++. A side effect, as you see, is that the functions have now been improved in a variety of ways.

And to be on the safe side, the old versions of the functions are still available in smooth under the names es_old(), msarima_old() and sma_old(). They will be removed from the package if it ever reaches the v.4.0.0.

New methods for ADAM

There are two new methods for adam() that can be used in a variety of cases. The first one is simulate(), which will generate data based on the estimated ADAM, whatever the original model is (e.g. mixture of ETS, ARIMA and regression on the data with multiple frequencies). Here is how it can be used:

adam(BJsales, "AAdN") |>
     simulate() |>
     plot()

which will produce a plot similar to the following:

Simulated data based on adam() applied to Box-Jenkins sales data

This can be used for research, when a more controlled environment is needed. If you want to fine tune the parameters of ADAM before simulating the data, you can save the output in an object and amend its parameters. For example:

testModel <- adam(BJsales, "AAdN")
testModel$persistence <- c(0.5, 0.2)
simulate(testModel)

The second new method is the xtable() from the respective xtable package. It produces LaTeX version of the table from the summary of ADAM. Here is an example of a summary from ADAM ETS:

adam(BJsales, "AAdN") |>
     summary()

Model estimated using adam() function: ETS(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 256.1516
Coefficients:
      Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha   0.9514     0.1292     0.6960      1.0000 *
beta    0.3328     0.2040     0.0000      0.7358  
phi     0.8560     0.1671     0.5258      1.0000 *
level 203.2835     5.9968   191.4304    215.1289 *
trend  -2.6793     4.7705   -12.1084      6.7437  

Error standard deviation: 1.3623
Sample size: 150
Number of estimated parameters: 6
Number of degrees of freedom: 144
Information criteria:
     AIC     AICc      BIC     BICc 
524.3032 524.8907 542.3670 543.8387

As you can see in the output above, the function generates the confidence intervals for the parameters of the model, including the smoothing parameters, dampening parameter and the initial states. This summary can then be used to generate the LaTeX code for the main part of the table:

adam(BJsales, "AAdN") |>
     xtable()

which will looks something like this:

Summary of adam()

Other improvements

First, one of the major changes in smooth functions is the new backcasting mechanism for adam(), es() and msarima() (this is discussed in Section 11.4 of ADAM monograph). The main difference with the old one is that now it does not backcast the parameters for the explanatory variables and estimates them separately via optimisation. This feature appeared to be important for some of users who wanted to try MSARIMAX/ETSX (a model with explanatory variables) but wanted to use backcasting as the initialisation. These users then wanted to get a summary, analysing the uncertainty around the estimates of parameters for exogenous variables, but could not because the previous implementation would not estimate them explicitly. This is now available. Here is an example:

cbind(BJsales, BJsales.lead) |>
    adam(model="AAdN", initial="backcasting") |>
    summary()

Model estimated using adam() function: ETSX(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 255.1935
Coefficients:
             Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha          0.9724     0.1108     0.7534      1.0000 *
beta           0.2904     0.1368     0.0199      0.5607 *
phi            0.8798     0.0925     0.6970      1.0000 *
BJsales.lead   0.1662     0.2336    -0.2955      0.6276  

Error standard deviation: 1.3489
Sample size: 150
Number of estimated parameters: 5
Number of degrees of freedom: 145
Information criteria:
     AIC     AICc      BIC     BICc 
520.3870 520.8037 535.4402 536.4841

As you can see in the output above, the initial level and trend of the model are not reported, because they were estimated via backcasting. However, we get the value of the parameter BJsales.lead and the uncertainty around it. The old backcasting approach is now called "complete", implying that all values of the state vector are produce via backcasting.

Second, forecast.adam() now has a parameter scenarios, which when TRUE will return the simulated paths from the model. This only works when interval="simulated" and can be used for the analysis of possible forecast trajectories.

Third, the plot() method now can also produce ACF/PACF for the squared residuals for all smooth functions. This becomes useful if you suspect that your data has ARCH elements and want to see if they need to be modelled separately. This can also be done using adam() and sm() and is discussed in Chapter 17 of the monograph.

Finally, the sma() function now has the fast parameter, which when true will use a modified Ternary search for the best order based on information criteria. It might not give the global minimum, but it works much faster than the exhaustive search.

Conclusions

These are the main new features in the package. I feel that the main job in smooth is already done, and all I can do now is just tune the functions and improve the existing code. I want to move all the functions to the new engine and ditch the old one, but this requires much more time than I have. So, I don't expect to finish this any time soon, but I hope I'll get there someday. On the other hand, I'm not sure that spending much time on developing an R package is a wise idea, given that nowadays people tend to use Python. I would develop Python analogue of the smooth package, but currently I don't have the necessary expertise and time to do that. Besides, there already exist great libraries, such as tsforecast from nixtla and sktime. I am not sure that another library, implementing ETS and ARIMA is needed in Python. What do you think?

Message smooth v3.2.0: what’s new? first appeared on Open Forecasting.

Smooth forecasting with the smooth package in R

Ivan Svetunkov — Wed, 04 Jan 2023 09:59:45 +0000

Authors: Ivan Svetunkov

Abstract: There are many forecasting related packages in R with varied popularity, the most famous of all being forecast, which implements several important forecasting approaches, such as ARIMA, ETS, TBATS and others. However, the main issue with the existing functionality is the lack of flexibility for research purposes, when it comes to modifying the implemented models. The R package smooth introduces a new approach to univariate forecasting, implementing ETS and ARIMA models in Single Source of Error (SSOE) state space form and implementing an advanced functionality for experiments and time series analysis. It builds upon the SSOE model and extends it by including explanatory variables, multiple frequencies, and introducing advanced forecasting instruments. In this paper, we explain the philosophy behind the package and show how the main functions work.

DOI: 10.48550/arXiv.2301.01790

How to cite: Svetunkov (2023). Smooth forecasting with the smooth package in R. OpenForecast.org

The story of the paper: This paper was rejected from the Journal of Statistical Software by a reviewer maintaining the package competing with the smooth. Given that the paper was written specifically for that journal, and I have nowhere else to submit it, I’ve decided to upload it online and make it freely available.

And here is the smooth hex sticker for completeness. If you need one, get in touch with me.

Message Smooth forecasting with the smooth package in R first appeared on Open Forecasting.