Archives ARIMA - Open Forecasting

Fundamental Flaw of the Box-Jenkins Methodology

Ivan Svetunkov — Tue, 13 May 2025 11:57:07 +0000

If you have taken a course on forecasting or time series analysis, you’ve probably heard of ARIMA and the Box–Jenkins methodology. In my opinion, this methodology has a fundamental flaw and should not be used in practice. Here’s why.

When Box and Jenkins wrote their book back in the 1960s, it was a very different era: computers were massive, and people worked with punch cards. To make their approach viable, Box and Jenkins developed a methodology for selecting the appropriate orders of AR and MA based on the values of the autocorrelation and partial autocorrelation functions (ACF and PACF, respectively). Their idea was that if an ARMA process generates a specific ACF/PACF pattern, then it could be identified by analysing those functions in the data. At the time, it wasn’t feasible to do cross-validation or rolling origin evaluation, and even using information criteria for model selection was a challenge. So, the Box–Jenkins approach was a sensible option, producing adequate results with limited computational resources, and was considered state of the art.

Unfortunately, as the M1 competition later showed (see my earlier post), the methodology didn’t work well in practice. Simpler methods that didn’t rely on rigorous model selection actually performed better. But in fact, the winning model in the competition was ARARMA by Emanuel Parzen (https://doi.org/10.1002/for.3980010108). His idea was to make the series stationary by applying a low-order, non-stationary AR to the data, then extract residuals and select appropriate ARMA orders using AIC. Parzen ignored the Box–Jenkins methodology entirely – he didn’t analyse ACF or PACF and instead relied fully on automated selection. And it worked!

So why didn’t the Box–Jenkins methodology perform as expected? In my monograph Forecasting and Analytics with ADAM, I use the following example to explain the main issue: “All birds have wings. Sarah has wings. Thus, Sarah is a bird.” But Sarah, as shown in the image attached to this post, is a butterfly.

The fundamental issue with the Box–Jenkins methodology lies in its logic: if a process generates a specific ACF/PACF, that doesn’t mean that an observed ACF/PACF must come from that process. Many ARMA and even non-ARMA processes can generate exactly the same autocorrelation structure.

Further developments in ARIMA modelling have shown that ACF and PACF can only be used as general guidelines for order selection. To assess model performance properly, we need other tools. All modern approaches rely on information criteria for ARIMA order selection, and they consistently perform well in forecasting competitions. For example, Hyndman & Khandakar (2008) use AIC for ARMA order selection, while Svetunkov & Boylan (2020) apply AIC after reformulating ARIMA in a state space form. The former is implemented in the forecast package in R and the StatsForecast library in Python (thanks to Nixtla and Azul Garza); the latter is available in the smooth package in R. I also discuss another ARIMA order selection approach in Section 15.2 of my book.

Long story short: don’t use the Box–Jenkins methodology for order selection. Use more modern tools, such as information criteria.

P.S. See also my early post on ARIMA, discussing what is wrong with it.

Message Fundamental Flaw of the Box-Jenkins Methodology first appeared on Open Forecasting.

Detecting patterns in white noise

Ivan Svetunkov — Wed, 10 Apr 2024 08:16:58 +0000

Back in 2015, when I was working on my paper on Complex Exponential Smoothing, I conducted a simple simulation experiment to check how ARIMA and ETS select components/orders in time series. And I found something interesting…

One of the important steps in forecasting with statistical models is identifying the existing structure. In the case of ETS, it comes to selecting trend/seasonal components, while for ARIMA, it’s about order selection. In R, several functions automatically handle this based on information criteria (Hyndman & Khandakar, 2006; Svetunkov & Boylan (2017); Chapter 15 of ADAM). I decided to investigate how this mechanism works.

I generated data from the Normal distribution with a fixed mean of 5000 and a standard deviation of 50. Then, I asked ETS and ARIMA (from the forecast package in R) to automatically select the appropriate model for each of 1000 time series. Here is the R code for this simple experiment:

Some R code

# Set random seed for reproducibility
set.seed(41, kind="L'Ecuyer-CMRG")
# Number of iterations
nsim <- 1000
# Number of observations
obsAll <- 120
# Generate data from N(5000, 50)
rnorm(nsim*obsAll, 5000, 50) |>
  matrix(obsAll, nsim) |>
  ts(frequency=12) -> x

# Load forecast package
library(forecast)
# Load doMC for parallel calculations
# doMC is only available on Linux and Max
# Use library(doParallel) on Windows
library(doMC)
registerDoMC(detectCores())

# A loop for ARIMA, recording the orders
matArima <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- auto.arima(x[,i])
    # The element number 5 is just m, period of seasonality
    return(c(testModel$arma[-5],(!is.na(testModel$coef["drift"]))*1))
}
rownames(matArima) <- c("AR","MA","SAR","SMA","I","SI","Drift")

# A loop for ETS, recording the model types
matEts <- foreach(i=1:nsim, .combine=cbind, .packages=c("forecast")) %dopar% {
    testModel <- ets(x[,i], allow.multiplicative.trend=TRUE)
    return(testModel[13]$method)
}

The findings of this experiment are summarised using the following chunk of the R code:

R code for the analysis of the results

#### Auto ARIMA ####
# Non-seasonal ARIMA elements
mean(apply(matArima[c("AR","MA","I","Drift"),]!=0, 2, any))
# Seasonal ARIMA elements
mean(apply(matArima[c("SAR","SMA","SI"),]!=0, 2, any))

#### ETS ####
# Trend in ETS
mean(substr(matEts,7,7)!="N")
# Seasonality in ETS
mean(substr(matEts,nchar(matEts)-1,nchar(matEts)-1)!="N")

I summarised them in the following table:

	ARIMA	ETS
Non-seasonal elements	24.8%	2.3%
Seasonal elements	18.0%	0.2%
Any type of structure	37.9%	2.4%

So, ARIMA detected some structure (had non-zero orders) in almost 40% of all time series, even though the data was designed to have no structure (just white noise). It also captured non-seasonal orders in a quarter of the series and identified seasonality in 18% of them. ETS performed better (only 0.2% of seasonal models identified on the white noise), but still captured trends in 2.3% of cases.

Does this simple experiment suggest that ARIMA is a bad model and ETS is a good one? No, it does not. It simply demonstrates that ARIMA tends to overfit the data if allowed to select whatever it wants. How can we fix that?

My solution: restrict the pool of ARIMA models to check, preventing it from going crazy. My personal pool includes ARIMA(0,1,1), (1,1,2), (0,2,2), along with the seasonal orders of (0,1,1), (1,1,2), and (0,2,2), and combinations between them. This approach is motivated by the connection between ARIMA and ETS. Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included.

This algorithm can be written in the following simple function that uses msarima() function from the smooth package in R (note that the reason why this function is used is because all ARIMA models implemented in the function are directly comparable via information criteria):

R code for the compact ARIMA function

arimaCompact <- function(y, lags=c(1,frequency(y)), ic=c("AICc","AIC","BIC","BICc"), ...){

    # Start measuring the time of calculations
    startTime <- Sys.time();

    # If there are no lags for the basic components, correct this.
    if(sum(lags==1)==0){
        lags <- c(1,lags);
    }

    orderLength <- length(lags);
    ic <- match.arg(ic);
    IC <- switch(ic,
                 "AIC"=AIC,
                 "AICc"=AICc,
                 "BIC"=BIC,
                 "BICc"=BICc);

    # We consider the following list of models:
    # ARIMA(0,1,1), (1,1,2), (0,2,2),
    # ARIMA(0,0,0)+c, ARIMA(0,1,1)+c,
    # seasonal orders (0,1,1), (1,1,2), (0,2,2)
    # And all combinations between seasonal and non-seasonal parts
    # 
    # Encode all non-seasonal parts
    nNonSeasonal <- 5
    arimaNonSeasonal <- matrix(c(0,1,1,0, 1,1,2,0, 0,2,2,0, 0,0,0,1, 0,1,1,1), nNonSeasonal,4,
                               dimnames=list(NULL, c("ar","i","ma","const")), byrow=TRUE)
    # Encode all seasonal parts ()
    nSeasonal <- 4
    arimaSeasonal <- matrix(c(0,0,0, 0,1,1, 1,1,2, 0,2,2), nSeasonal,3,
                               dimnames=list(NULL, c("sar","si","sma")), byrow=TRUE)

    # Check all the models in the pool
    testModels <- vector("list", nSeasonal*nNonSeasonal);
    m <- 1;
    for(i in 1:nSeasonal){
        for(j in 1:nNonSeasonal){
            testModels[[m]] <- msarima(y, orders=list(ar=c(arimaNonSeasonal[j,1],arimaSeasonal[i,1]),
                                                      i=c(arimaNonSeasonal[j,2],arimaSeasonal[i,2]),
                                                      ma=c(arimaNonSeasonal[j,3],arimaSeasonal[i,3])),
                                       constant=arimaNonSeasonal[j,4]==1, lags=lags, ...);
            m[] <- m+1;
        }
    }

    # Find the best one
    m <- which.min(sapply(testModels, IC));
    # Amend computational time
    testModels[[m]]$timeElapsed <- Sys.time()-startTime;

    return(testModels[[m]]);
}

Additionally, we can check whether the addition of AR/MA orders detected by ACF/PACF analysis of the best model reduces the AICc. If not, they shouldn't be included. I have not added that part in the code above. Still, this algorithm brings some improvements:

R code for the application of compact ARIMA to the data

#### Load the smooth package
library(smooth)

# A loop for the compact ARIMA, recording the orders
matArimaCompact <- foreach(i=1:nsim, .packages=c("smooth")) %dopar% {
    testModel <- arimaCompact(x[,i])
    return(orders(testModel))
}

#### Auto MSARIMA from smooth ####
# Non-seasonal ARIMA elements
mean(sapply(sapply(matArimaCompact, "[[", "ar"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "i"), function(x){x[1]!=0}) |
  sapply(sapply(matArimaCompact, "[[", "ma"), function(x){x[1]!=0}))

# Seasonal ARIMA elements
mean(sapply(sapply(matArimaSmooth, "[[", "ar"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "i"), function(x){length(x)==2 && (x[2]!=0)}) |
  sapply(sapply(matArimaSmooth, "[[", "ma"), function(x){length(x)==2 && (x[2]!=0)}))

In my case, it resulted in the following:

	ARIMA	ETS	Compact ARIMA
Non-seasonal elements	24.8%	2.3%	2.4%
Seasonal elements	18.0%	0.2%	0.0%
Any type of structure	37.9%	2.4%	2.4%

As we see, when we impose restrictions on order selection in ARIMA, it avoids fitting seasonal models to non-seasonal data. While it still makes minor mistakes in terms of non-seasonal structure, it's nothing compared to the conventional approach. What about accuracy? I don't know. I'll have to write another post on this :).

Note that the models were applied to samples of 120 observations, which is considered "small" in statistics, while in real life is sometimes a luxury to have...

Message Detecting patterns in white noise first appeared on Open Forecasting.

What’s wrong with ARIMA?

Ivan Svetunkov — Thu, 21 Mar 2024 10:10:52 +0000

Have you heard of ARIMA? It is one of the benchmark forecasting models used in different academic experiments, although it is not always popular among practitioners. But why? What’s wrong with ARIMA?

ARIMA has been a standard forecasting model in statistics for ages. It gained popularity with the famous Box & Jenkins (1970) book and was considered the best forecasting model by statisticians for a couple of decades without any strong evidence to support this.

It represents one of the two fundamental approaches to time series modeling (the second being the state space approach): it captures the relation between the variable and itself in the past. This has a great rationale in technical areas. For example, the quantity of CO2 in a furnace at this moment in time will depend on the quantity of CO2 five minutes ago. Such processes can be efficiently modeled and then forecasted using ARIMA. In demand forecasting, making sense of ARIMA is more challenging: it is hard to argue that the demand for shoes on Monday can impact the demand on Tuesday. So, when we apply ARIMA to such data, we sort of rely on a spurious relation. Still, demand data often exhibits autocorrelations, and ARIMA has been used efficiently in that context.

Over the years, ARIMA did not perform well in different competitions (see my post about that), but this was mainly due to the wrong assumptions in Box-Jenkins methodology, not because the model has been fundamentally bad. After Hyndman & Khandakar (2006) implemented their version with automatic order selection based on information criteria, ARIMA has started producing much more accurate forecasts.

But if I were to summarize what the problem with the model is, I would outline these points:

It is hard to explain ARIMA to people who are not comfortable with statistics. Here is an example of how seasonal ARIMA(1,0,1)(1,0,1)_4 is written mathematically:
\begin{equation*}
y_t (1 -\phi_{4,1} B^4)(1 -\phi_{1} B) = \epsilon_t (1 + \theta_{4,1} B^4) (1 + \theta_{1} B).
\end{equation*}
Good luck explaining this to a demand planner who does not know mathematics.
It is hard to estimate, especially for models with seasonality. It is typically estimated using some numeric optimisation, and reaching the maximum likelihood (or a global minimum of a loss function) is not guaranteed.
It is hard to select the appropriate order of the model, as there can be thousands of potential models to choose from. Yes, there are heuristic approaches that allow simplifying the problem and selecting a reasonable model (e.g. Hyndman & Khandakar, 2006; or Svetunkov & Boylan, 2017), but they do not guarantee that you will get the best possible model.

Nonetheless, ARIMA is a strong contender that can outperform other models if implemented well. Furthermore, it has become one of the standard forecasting benchmarks in forecasting-related experiments. So, if you are a data scientist comfortable with mathematics and want to see how your machine learning approach performs, you should consider including ARIMA as a benchmark.

P.S. Check out a post by Nicolas Vandeput on LinkedIn – he had a discussion about ARIMA and raised good points as well.

Message What’s wrong with ARIMA? first appeared on Open Forecasting.

Multi-step Estimators and Shrinkage Effect in Time Series Models

Ivan Svetunkov — Wed, 09 Aug 2023 10:14:59 +0000

Authors: Ivan Svetunkov, Nikos Kourentzes, Rebecca Killick

Journal: Computational Statistics

Abstract: Many modern statistical models are used for both insight and prediction when applied to data. When models are used for prediction one should optimise parameters through a prediction error loss function. Estimation methods based on multiple steps ahead forecast errors have been shown to lead to more robust and less biased estimates of parameters. However, a plausible explanation of why this is the case is lacking. In this paper, we provide this explanation, showing that the main benefit of these estimators is in a shrinkage effect, happening in univariate models naturally. However, this can introduce a series of limitations, due to overly aggressive shrinkage. We discuss the predictive likelihoods related to the multistep estimators and demonstrate what their usage implies to time series models. To overcome the limitations of the existing multiple steps estimators, we propose the Geometric Trace Mean Squared Error, demonstrating its advantages. We conduct a simulation experiment showing how the estimators behave with different sample sizes and forecast horizons. Finally, we carry out an empirical evaluation on real data, demonstrating the performance and advantages of the estimators. Given that the underlying process to be modelled is often unknown, we conclude that the shrinkage achieved by the GTMSE is a competitive alternative to conventional ones.

DOI: 10.1007/s00180-023-01377-x.

Working paper.

About the paper

DISCLAIMER 1: To better understand what I am talking about in this section, I would recommend you to have a look at the ADAM monograph, and specifically at the Chapter 11. In fact, Section 11.3 is based on this paper.

DISCLAIMER 2: All the discussions in the paper only apply to pure additive models. If you are interested in multiplicative or mixed ETS models, you’ll have to wait another seven years for another paper on this topic to get written and published.

Introduction

There are lots of ways how dynamic models can be estimated. Some analysts prefer likelihood, some would stick with Least Squares (i.e. minimising MSE), while others would use advanced estimators like Huber’s loss or M-estimators. And sometimes, statisticians or machine learning experts would use multiple steps ahead estimators. For example, they would use a so-called “direct forecast” by fitting a model to the data, producing h-steps ahead in-sample point forecasts from the very first to the very last observation, then calculating the respective h-steps ahead forecast errors and (based on them) Mean Squared Error. Mathematically, this can be written as:

\begin{equation} \label{eq:hstepsMSE}
\mathrm{MSE}_h = \frac{1}{T-h} \sum_{t=1}^{T-h} e_{t+h|t}^2 ,
\end{equation}
where \(e_{t+h|t}\) is the h-steps ahead error for the point forecast produced from the observation \(t\), and \(T\) is the sample size.

In my final year of PhD, I have decided to analyse how different multistep loss functions work, to understand what happens with dynamic models, when these losses are minimised, and how this can help in efficient model estimation. Doing the literature review, I noticed that the claims about the multistep estimators are sometimes contradictory: some authors say that they are more efficient (i.e. estimates of parameters have lower variances) than the conventional estimators, some say that they are less efficient; some claim that they improve accuracy, while the others do not find any substantial improvements. Finally, I could not find a proper explanation of what happens with the dynamic models when the estimators are used. So, I’ve started my own investigation, together with Nikos Kourentzes and Rebecca Killick (who was my internal examiner and joined our team after my graduation).

Our investigation started with the single source of error model, then led us to predictive likelihoods and, after that – to the development of a couple of non-conventional estimators. As a result, the paper grew and became less focused than initially intended. In the end, it became 42 pages long and discussed several aspects of models estimation (making it a bit of a hodgepodge):

How multistep estimators regularise parameters of dynamic models;
That multistep forecast errors are always correlated when the models’ parameters are not zero;
What predictive likelihoods align with the multistep estimators (this is useful for a discussion of their statistical properties);
How General Predictive Likelihood encompasses all popular multistep estimators;
And that there is another estimator (namely GTMSE – Geometric Trace Mean Squared Error), which has good properties and has not been discussed in the literature before.

Because of the size of the paper and the spread of the topics throughout it, many reviewers ignored (1) – (4), focusing on (5) and thus rejecting the paper on the grounds that we propose a new estimator, but instead spend too much time discussing irrelevant topics. These types of comments were given to us by the editor of the Journal of the Royal Statistical Society: B and reviewers of Computational Statistics and Data Analysis. While we tried addressing this issue several times, given the size of the paper, we failed to fix it fully. The paper was rejected from both of these journals and ended up in Computational Statistics, where the editor gave us a chance to respond to the comments. We explained what the paper was really about and changed its focus to satisfy the reviewers, after which the paper was accepted.

So, what are the main findings of this paper?

How multistep estimators regularise parameters of dynamic models

Given that any dynamic model (such as ETS or ARIMA) can be represented in the Single Source of Error state space form, we showed that the application of multistep estimators leads to the inclusion of parameters of models in the loss function, leading to the regularisation. In ETS, this means that the smoothing parameters are shrunk to zero, with the shrinkage becoming stronger with the increase of the forecasting horizon relative to the sample size. This makes the models less stochastic and more conservative. Mathematically this becomes apparent if we express the conditional multistep variance in terms of smoothing parameters and one-step-ahead error variance. For example, for ETS(A,N,N) we have:

\begin{equation} \label{eq:hstepsMSEVariance}
\mathrm{MSE}_h \propto \hat{\sigma}_1^2 \left(1 +(h-1) \hat{\alpha} \right),
\end{equation}
where \( \hat{\alpha} \) is the smoothing parameter and \(\hat{\sigma}_1^2 \) is the one-step-ahead error variance. From the formula \eqref{eq:hstepsMSEVariance}, it becomes apparent that when we minimise MSE\(_h\), the estimated variance and the smoothing parameters will be minimised as well. This is how the shrinkage effect appears: we force \( \hat{\alpha} \) to become as close to zero as possible, and the strength of shrinkage is regulated by the forecasting horizon \( h \).

In the paper itself, we discuss this effect for several multistep estimators (the specific effect would be different between them) and several ETS and ARIMA models. While for ETS, it is easy to show how shrinkage works, for ARIMA, the situation is more complicated because the direction of shrinkage would change with the ARIMA orders. Still, what can be said clearly for any dynamic model is that the multistep estimators make them less stochastic and more conservative.

Multistep forecast errors are always correlated

This is a small finding, done in bypassing. It means that, for example, the forecast error two steps ahead is always correlated with the three steps ahead one. This does not depend on the autocorrelation of residuals or any violation of assumptions of the model but rather only on whether the parameters of the model are zero or not. This effect arises from the model rather than from the data. The only situation when the forecast errors will not be correlated is when the model is deterministic (e.g. linear trend). This has important practical implications because some forecasting techniques make explicit and unrealistic assumptions that these correlations are zero, which would impact the final forecasts.

Predictive likelihoods aligning with the multistep estimators

We showed that if a model assumes the Normal distribution, in the case of MSEh and MSCE (Mean Squared Cumulative Error), the distribution of the future values follows Normal as well. This means that there are predictive likelihood functions for these models, the maximum of which is achieved with the same set of parameters as the minimum of the multistep estimators. This has two implications:

These multistep estimators should be consistent and efficient, especially when the smoothing parameters are close to zero;
The predictive likelihoods can be used in the model selection via information criteria.

The first point also explains the contradiction in the literature: if the smoothing parameter in the population is close to zero, then the multistep estimators will give more efficient estimates than the conventional estimators; in the other case, it might be less efficient. We have not used the second point above, but it would be useful when the best model needs to be selected for the data, and an analyst wants to use information criteria. This is one of the potential ways for future research.

How General Predictive Likelihood (GPL) encompasses all popular multistep estimators

GPL arises when the joint distribution of 1 to h steps ahead forecast errors is considered. It will be Multivariate Normal if the model assumes normality. In the paper, we showed that the maximum of GPL coincides with the minimum of the so-called “Generalised Variance” – the determinant of the covariance matrix of forecast errors. This minimisation reduces variances for all the forecast errors (from 1 to h) and increases the covariances between them, making the multistep forecast errors look more similar. In the perfect case, when the model is correctly specified (no omitted or redundant variables, homoscedastic residuals etc), the maximum of GPL will coincide with the maximum of the conventional likelihood of the Normal distribution (see Section 11.1 of the ADAM monograph).

Accidentally, it can be shown that the existing estimators are just special cases of the GPL, but with some restrictions on the covariance matrix. I do not intend to show it here, the reader is encouraged to either read the paper or see the brief discussion in Subsection 11.3.5 of the ADAM monograph.

GTMSE – Geometric Trace Mean Squared Error

Finally, looking at the special cases of GPL, we have noticed that there is one which has not been discussed in the literature. We called it Geometric Trace Mean Squared Error (GTMSE) because of the logarithms in the formula:
\begin{equation} \label{eq:GTMSE}
\mathrm{GTMSE} = \sum_{j=1}^h \log \frac{1}{T-j} \sum_{t=1}^{T-j} e_{t+j|t}^2 .
\end{equation}
GTMSE imposes shrinkage on parameters similar to other estimators but does it more mildly because of the logarithms in the formula. In fact, what the logarithms do is make variances of all forecast errors similar to each other. As a result, when used, GTMSE does not focus on the larger variances as other methods do but minimises all of them simultaneously similarly.

Examples in R

The estimators discussed in the paper are all implemented in the functions of the smooth package in R, including adam(), es(), ssarima(), msarima() and ces(). In the example below, we will see how the shrinkage works for the ETS on the example of Box-Jenkins sales data (this is the example taken from ADAM, Subsection 11.3.7):

library(smooth)

adamETSAANBJ <- vector("list",6)
names(adamETSAANBJ) <- c("MSE","MSEh","TMSE","GTMSE","MSCE","GPL")
for(i in 1:length(adamETSAANBJ)){
    adamETSAANBJ[[i]] <- adam(BJsales, "AAN", h=10, holdout=TRUE,
                              loss=names(adamETSAANBJ)[i])
}

The ETS(A,A,N) model, applied to this data, has different estimates of smoothing parameters:

sapply(adamETSAANBJ,"[[","persistence") |>
	round(5)

          MSE MSEh TMSE   GTMSE MSCE GPL
alpha 1.00000    1    1 1.00000    1   1
beta  0.23915    0    0 0.14617    0   0

We can see how shrinkage shows itself in the case of the smoothing parameter \(\beta\), which is shrunk to zero by MSEh, TMSE, MSCE and GPL but left intact by MSE and shrunk a little bit in the case of GTMSE. These different estimates of parameters lead to different forecasting trajectories and prediction intervals, as can be shown visually:

par(mfcol=c(3,2), mar=c(2,2,4,1))
# Produce forecasts
lapply(adamETSAANBJ, forecast, h=10, interval="prediction") |>
# Plot forecasts
    lapply(function(x, ...) plot(x, ylim=c(200,280), main=x$model$loss))

This should result in the following plots:

ADAM ETS on Box-Jenkins data with several estimators

Analysing the figure, it looks like the shrinkage of the smoothing parameter \(\beta\) is useful for this time series: the forecasts from ETS(A,A,N) estimated using MSEh, TMSE, MSCE and GPL look closer to the actual values than the ones from MSE and GTMSE. To assess their performance more precisely, we can extract error measures from the models:

sapply(adamETSAANBJ,"[[","accuracy") |>
	round(5)[c("ME","MSE"),]

         MSE    MSEh    TMSE    GTMSE    MSCE     GPL
ME   3.22900 1.06479 1.05233  3.44962 1.04604 0.95515
MSE 14.41862 2.89067 2.85880 16.26344 2.84288 2.62394

Alternatively, we can calculate error measures based on the produced forecasts and the measures() function from the greybox package:

lapply(adamETSAANBJ, forecast, h=10) |>
    sapply(function(x, ...) measures(holdout=x$model$holdout,
                                     forecast=x$mean,
                                     actual=actuals(x$model)))

A thing to note about the multistep estimators is that they are slower than the conventional ones because they require producing 1 to \( h \) steps ahead forecasts from every observation in-sample. In the case of the smooth functions, the time elapsed can be extracted from the models in the following way:

sapply(adamETSAANBJ, "[[", "timeElapsed")

In summary, the multistep estimators are potentially useful in forecasting and can produce models with more accurate forecasts. This happens because they impose shrinkage on the estimates of parameters, making models less stochastic and more inert. But their performance depends on each specific situation and the available data, so I would not recommend using them universally.

Message Multi-step Estimators and Shrinkage Effect in Time Series Models first appeared on Open Forecasting.

smooth v3.2.0: what’s new?

Ivan Svetunkov — Mon, 30 Jan 2023 13:06:47 +0000

smooth package has reached version 3.2.0 and is now on CRAN. While the version change from 3.1.7 to 3.2.0 looks small, this has introduced several substantial changes and represents a first step in moving to the new C++ code in the core of the functions. In this short post, I will outline the main new features of smooth 3.2.0.

New engines for ETS, MSARIMA and SMA

The first and one of the most important changes is the new engine for the ETS (Error-Trend-Seasonal exponential smoothing model), MSARIMA (Multiple Seasonal ARIMA) and SMA (Simple Moving Average), implemented respectively in es(), msarima() and sma() functions. The new engine was developed for adam() and the three models above can be considered as special cases of it. You can read more about ETS in ADAM monograph, starting from Chapter 4; MSARIMA is discussed in Chapter 9, while SMA is briefly discussed in Subsection 3.3.3.

The es() function now implements the ETS close to the conventional one, assuming that the error term follows normal distribution. It still supports explanatory variables (discussed in Chapter 10 of ADAM monograph) and advanced estimators (Chapter 11), and it has the same syntax as the previous version of the function had, but now acts as a wrapper for adam(). This means that it is now faster, more accurate and requires less memory than it used to. msarima() being a wrapper of adam() as well, is now also faster and more accurate than it used to be. But in addition to that both functions now support the methods that were developed for adam(), including vcov(), confint(), summary(), rmultistep(), reapply(), plot() and others. So, now you can do more thorough analysis and improve the models using all these advanced instruments (see, for example, Chapter 14 of ADAM).

The main reason why I moved the functions to the new engine was to clean up the code and remove the old chunks that were developed when I only started learning C++. A side effect, as you see, is that the functions have now been improved in a variety of ways.

And to be on the safe side, the old versions of the functions are still available in smooth under the names es_old(), msarima_old() and sma_old(). They will be removed from the package if it ever reaches the v.4.0.0.

New methods for ADAM

There are two new methods for adam() that can be used in a variety of cases. The first one is simulate(), which will generate data based on the estimated ADAM, whatever the original model is (e.g. mixture of ETS, ARIMA and regression on the data with multiple frequencies). Here is how it can be used:

adam(BJsales, "AAdN") |>
     simulate() |>
     plot()

which will produce a plot similar to the following:

Simulated data based on adam() applied to Box-Jenkins sales data

This can be used for research, when a more controlled environment is needed. If you want to fine tune the parameters of ADAM before simulating the data, you can save the output in an object and amend its parameters. For example:

testModel <- adam(BJsales, "AAdN")
testModel$persistence <- c(0.5, 0.2)
simulate(testModel)

The second new method is the xtable() from the respective xtable package. It produces LaTeX version of the table from the summary of ADAM. Here is an example of a summary from ADAM ETS:

adam(BJsales, "AAdN") |>
     summary()

Model estimated using adam() function: ETS(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 256.1516
Coefficients:
      Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha   0.9514     0.1292     0.6960      1.0000 *
beta    0.3328     0.2040     0.0000      0.7358  
phi     0.8560     0.1671     0.5258      1.0000 *
level 203.2835     5.9968   191.4304    215.1289 *
trend  -2.6793     4.7705   -12.1084      6.7437  

Error standard deviation: 1.3623
Sample size: 150
Number of estimated parameters: 6
Number of degrees of freedom: 144
Information criteria:
     AIC     AICc      BIC     BICc 
524.3032 524.8907 542.3670 543.8387

As you can see in the output above, the function generates the confidence intervals for the parameters of the model, including the smoothing parameters, dampening parameter and the initial states. This summary can then be used to generate the LaTeX code for the main part of the table:

adam(BJsales, "AAdN") |>
     xtable()

which will looks something like this:

Summary of adam()

Other improvements

First, one of the major changes in smooth functions is the new backcasting mechanism for adam(), es() and msarima() (this is discussed in Section 11.4 of ADAM monograph). The main difference with the old one is that now it does not backcast the parameters for the explanatory variables and estimates them separately via optimisation. This feature appeared to be important for some of users who wanted to try MSARIMAX/ETSX (a model with explanatory variables) but wanted to use backcasting as the initialisation. These users then wanted to get a summary, analysing the uncertainty around the estimates of parameters for exogenous variables, but could not because the previous implementation would not estimate them explicitly. This is now available. Here is an example:

cbind(BJsales, BJsales.lead) |>
    adam(model="AAdN", initial="backcasting") |>
    summary()

Model estimated using adam() function: ETSX(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 255.1935
Coefficients:
             Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha          0.9724     0.1108     0.7534      1.0000 *
beta           0.2904     0.1368     0.0199      0.5607 *
phi            0.8798     0.0925     0.6970      1.0000 *
BJsales.lead   0.1662     0.2336    -0.2955      0.6276  

Error standard deviation: 1.3489
Sample size: 150
Number of estimated parameters: 5
Number of degrees of freedom: 145
Information criteria:
     AIC     AICc      BIC     BICc 
520.3870 520.8037 535.4402 536.4841

As you can see in the output above, the initial level and trend of the model are not reported, because they were estimated via backcasting. However, we get the value of the parameter BJsales.lead and the uncertainty around it. The old backcasting approach is now called "complete", implying that all values of the state vector are produce via backcasting.

Second, forecast.adam() now has a parameter scenarios, which when TRUE will return the simulated paths from the model. This only works when interval="simulated" and can be used for the analysis of possible forecast trajectories.

Third, the plot() method now can also produce ACF/PACF for the squared residuals for all smooth functions. This becomes useful if you suspect that your data has ARCH elements and want to see if they need to be modelled separately. This can also be done using adam() and sm() and is discussed in Chapter 17 of the monograph.

Finally, the sma() function now has the fast parameter, which when true will use a modified Ternary search for the best order based on information criteria. It might not give the global minimum, but it works much faster than the exhaustive search.

Conclusions

These are the main new features in the package. I feel that the main job in smooth is already done, and all I can do now is just tune the functions and improve the existing code. I want to move all the functions to the new engine and ditch the old one, but this requires much more time than I have. So, I don't expect to finish this any time soon, but I hope I'll get there someday. On the other hand, I'm not sure that spending much time on developing an R package is a wise idea, given that nowadays people tend to use Python. I would develop Python analogue of the smooth package, but currently I don't have the necessary expertise and time to do that. Besides, there already exist great libraries, such as tsforecast from nixtla and sktime. I am not sure that another library, implementing ETS and ARIMA is needed in Python. What do you think?

Message smooth v3.2.0: what’s new? first appeared on Open Forecasting.

ISF2022: How to make ETS work with ARIMA

Ivan Svetunkov — Wed, 20 Jul 2022 12:06:48 +0000

This time ISF took place in Oxford. I acted as a programme chair of the event and was quite busy with schedule and some other minor organisational things, but I still found time to present something new. Specifically, I talked about one specific part of ADAM, the part implementing ETS+ARIMA. The idea is that the two models are considered as competing, belonging to different families. But we have known how to unite them at least since 1985. So, it is about time to make this brave step and implement ETS with ARIMA elements.

ETS+ARIMA love story with happy ending…

This talk was based on Chapter 9 of ADAM monograph, and more specifically on Section 9.4.

The slides of the presentation are available here.

Message ISF2022: How to make ETS work with ARIMA first appeared on Open Forecasting.

The first draft of “Forecasting and Analytics with ADAM”

Ivan Svetunkov — Mon, 11 Apr 2022 15:30:26 +0000

Forecasting and Analytics with ADAM

After working on this for more than a year, I have finally prepared the first draft of my online monograph “Forecasting and Analytics with ADAM“. This is a monograph on the model that unites ETS, ARIMA and regression and introduces advanced features in univariate modelling, including:

ETS in a new State Space form;
ARIMA in a new State Space form;
Regression;
TVP regression;
Combinations of (1), (2) and either (3), or (4);
Automatic selection/combination for ETS;
Automatic orders selection for ARIMA;
Variables selection for regression part;
Normal and non-normal distributions;
Automatic selection of most suitable distribution;
Multiple seasonality;
Occurrence part of the model to handle zeroes in data (intermittent demand);
Modelling scale of distribution (GARCH and beyond);
Handling uncertainty of estimates of parameters.

The model and all its features are already implemented in adam() function from smooth package for R (you need v3.1.6 from CRAN for all the features listed above). The function supports many options that allow one experimenting with univariate forecasting, allowing to build complex models, combining elements from the list above. The monograph explaining how models underlying ADAM and how to work with them is available online, and I plan to produce several physical copies of it after refining the text. Furthermore, I have already asked two well-known academics to act as reviewers of the monograph to collect the feedback and improve the monograph, and if you want to act as a reviewer as well, please let me know.

Examples in R

Just to give you a flavour of ADAM, I decided to provide a couple of examples on time series AirPassengers (included in datasets package in R). The first one is the ADAM ETS.

Building and selecting the most appropriate ADAM ETS comes to running the following line of code:

adamETSAir <- adam(AirPassengers, h=12, holdout=TRUE)

In this case, ADAM will select the most appropriate ETS model for the data, creating a holdout of the last 12 observations. We can see the details of the model by printing the output:

adamETSAir

Time elapsed: 0.75 seconds
Model estimated using adam() function: ETS(MAM)
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 467.2981
Persistence vector g:
 alpha   beta  gamma 
0.7691 0.0053 0.0000 

Sample size: 132
Number of estimated parameters: 17
Number of degrees of freedom: 115
Information criteria:
      AIC      AICc       BIC      BICc 
 968.5961  973.9646 1017.6038 1030.7102 

Forecast errors:
ME: 9.537; MAE: 20.784; RMSE: 26.106
sCE: 43.598%; Asymmetry: 64.8%; sMAE: 7.918%; sMSE: 0.989%
MASE: 0.863; RMSSE: 0.833; rMAE: 0.273; rRMSE: 0.254

The output above provides plenty of detail on what was estimated and how. Some of these elements have been discussed in one of my previous posts on es() function. The new thing is the information about the assumed distribution for the response variable. By default, ADAM works with Gamma distribution in case of multiplicative error model. This is done to make model more robust in cases of low volume data, where the Normal distribution might produce negative numbers (see my presentation on this issues). In case of high volume data, the Gamma distribution will perform similar to the Normal one. The pure multiplicative ADAM ETS is discussed in Chapter 6 of ADAM monograph. If Gamma is not suitable, then the other distribution can be selected via the distribution parameter. There is also an automated distribution selection approach in the function auto.adam():

adamETSAutoAir <- auto.adam(AirPassengers, h=12, holdout=TRUE)
adamETSAutoAir

Time elapsed: 3.86 seconds
Model estimated using auto.adam() function: ETS(MAM)
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 466.0744
Persistence vector g:
 alpha   beta  gamma 
0.8054 0.0000 0.0000 

Sample size: 132
Number of estimated parameters: 17
Number of degrees of freedom: 115
Information criteria:
      AIC      AICc       BIC      BICc 
 966.1487  971.5172 1015.1564 1028.2628 

Forecast errors:
ME: 9.922; MAE: 21.128; RMSE: 26.246
sCE: 45.36%; Asymmetry: 65.4%; sMAE: 8.049%; sMSE: 1%
MASE: 0.877; RMSSE: 0.838; rMAE: 0.278; rRMSE: 0.255

As we see from the output above, the Normal distribution is more appropriate for the data in terms of AICc than the other ones tried out by the function (by default the list includes Normal, Laplace, S, Generalised Normal, Gamma, Inverse Gaussian and Log Normal distributions, but this can be amended by providing a vector of names via distribution parameter). The selection of ADAM ETS and distributions is discussed in Chapter 15 of the monograph.

Having obtained the model, we can diagnose it using plot.adam() function:

par(mfcol=c(3,3))
plot(adamETSAutoAir,which=c(1,4,2,6,7,8,10,11,13))

The which parameter specifies what type of plots to produce, you can find the list of plots in the documentation for plot.adam(). The code above will result in:

Diagnostics plots for ADAM ETS on AirPassengers data

The diagnostic plots are discussed in the Chapter 14 of ADAM monograph. The plot above does not show any serious issues with the model.

Just for the comparison, we could also try fitting the most appropriate ADAM ARIMA to the data (this model is discussed in Chapter 9). The code in this case is slightly more complicated, because we need to switch off ETS part of the model and define the maximum orders of ARIMA to try:

adamARIMAAir <- adam(AirPassengers, model="NNN", h=12, holdout=TRUE,
                     orders=list(ar=c(3,2),i=c(2,1),ma=c(3,2),select=TRUE))

This results in the following automatically selected ARIMA model:

Time elapsed: 3.54 seconds
Model estimated using auto.adam() function: SARIMA(0,1,1)[1](0,1,1)[12]
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 491.7117
ARMA parameters of the model:
MA:
 theta1[1] theta1[12] 
   -0.1952    -0.0720 

Sample size: 132
Number of estimated parameters: 16
Number of degrees of freedom: 116
Information criteria:
     AIC     AICc      BIC     BICc 
1015.423 1020.154 1061.548 1073.097 

Forecast errors:
ME: -13.795; MAE: 16.65; RMSE: 21.644
sCE: -63.064%; Asymmetry: -79.4%; sMAE: 6.343%; sMSE: 0.68%
MASE: 0.691; RMSSE: 0.691; rMAE: 0.219; rRMSE: 0.21

Given that ADAM ETS and ADAM ARIMA are formulated in the same framework, they are directly comparable using information critirea. Comparing AICc of the models adamETSAutoAir and adamARIMAAir, we can conclude that the former is more appropriate to the data than the latter. However, the default ARIMA works with the Normal distribution, which might not be appropriate for the data, so we can revert to the auto.adam() to select the better one:

adamAutoARIMAAir <- auto.adam(AirPassengers, model="NNN", h=12, holdout=TRUE,
                              orders=list(ar=c(3,2),i=c(2,1),ma=c(3,2),select=TRUE))

This will take more computational time, but will result in a different model with a lower AICc (which is still higher than the one in ADAM ETS):

Time elapsed: 25.46 seconds
Model estimated using auto.adam() function: SARIMA(0,1,1)[1](0,1,1)[12]
Distribution assumed in the model: Log-Normal
Loss function type: likelihood; Loss function value: 472.923
ARMA parameters of the model:
MA:
 theta1[1] theta1[12] 
   -0.2785    -0.5530 

Sample size: 132
Number of estimated parameters: 16
Number of degrees of freedom: 116
Information criteria:
      AIC      AICc       BIC      BICc 
 977.8460  982.5764 1023.9708 1035.5197 

Forecast errors:
ME: -12.968; MAE: 13.971; RMSE: 19.143
sCE: -59.285%; Asymmetry: -91.7%; sMAE: 5.322%; sMSE: 0.532%
MASE: 0.58; RMSSE: 0.611; rMAE: 0.184; rRMSE: 0.186

Note that although the AICc is higher for ARIMA than for ETS, the former has lower error measures than the latter. So, the higher AICc does not necessarily mean that the model is not good. But if we rely on the information criteria, then we should stick with ADAM ETS and we can then produce the forecasts for the next 12 observations (see Chapter 18):

adamETSAutoAirForecast <- forecast(adamETSAutoAir, h=12, interval="prediction",
                                   level=c(0.9,0.95,0.99))
par(mfcol=c(1,1))
plot(adamETSAutoAirForecast)

Forecast from ADAM ETS

Finally, if we want to do a more in-depth analysis of parameters of ADAM, we can also produce the summary, which will create the confidence intervals for the parameters of the model:

summary(adamETSAutoAir)

Model estimated using auto.adam() function: ETS(MAM)
Response variable: data
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 466.0744
Coefficients:
            Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha         0.8054     0.0864     0.6343      0.9761 *
beta          0.0000     0.0203     0.0000      0.0401  
gamma         0.0000     0.0382     0.0000      0.0755  
level        96.2372     6.8596    82.6496    109.7919 *
trend         2.0901     0.3955     1.3068      2.8716 *
seasonal_1    0.9145     0.0077     0.9003      0.9372 *
seasonal_2    0.8999     0.0081     0.8857      0.9227 *
seasonal_3    1.0308     0.0094     1.0165      1.0535 *
seasonal_4    0.9885     0.0077     0.9743      1.0112 *
seasonal_5    0.9856     0.0072     0.9713      1.0083 *
seasonal_6    1.1165     0.0093     1.1023      1.1392 *
seasonal_7    1.2340     0.0115     1.2198      1.2568 *
seasonal_8    1.2254     0.0105     1.2112      1.2481 *
seasonal_9    1.0668     0.0094     1.0526      1.0896 *
seasonal_10   0.9256     0.0087     0.9113      0.9483 *
seasonal_11   0.8040     0.0075     0.7898      0.8268 *

Error standard deviation: 0.0367
Sample size: 132
Number of estimated parameters: 17
Number of degrees of freedom: 115
Information criteria:
      AIC      AICc       BIC      BICc 
 966.1487  971.5172 1015.1564 1028.2628

Note that the summary() function might complain about the Observed Fisher Information. This is because the covariance matrix of parameters is calculated numerically and sometimes the likelihood is not maximised properly. I have not been able to fully resolve this issue yet, but hopefully will do at some point. The summary above shows, for example, that the smoothing parameters \(\beta\) and \(\gamma\) are not significantly different from zero (on 5% level), while \(\alpha\) is expected to vary between 0.6343 and 0.9761 in 95% of the cases. You can read more about the uncertainty of parameters in ADAM in Chapter 16 of the monograph.

As for the other features of ADAM, here is a brief guide:

If you work with multiple seasonal data, then you might need to specify the seasonality via the lags parameter, for example as lags=c(24,7*24) in case of hourly data. This is discussed in Chapter 12;
If you have intermittent data, then you should read Chapter 13, which explains how to work with the occurrence parameter of the function;
Explanatory variables are discussed in Chapter 10 and are handled in the adam() function via the formula parameter;
In the cases of heteroscedasticity (time varying or induced by some explanatory variables), there a scale model (which is discussed in Chapter 17 and implemented as sm() method for the adam class).

You can also experiment with advanced estimators (Chapter 11, including custom loss functions) via the loss parameter and forecast combinations (Section 15.4).

Long story short, if you are interested in univariate forecasting, then do give ADAM a try - it might have the flexibility you needed for your experiments. If you are worried about its accuracy, check out this post, where I compared ADAM with other models.

And, as a friend of mine says, "Happy forecasting!"

Message The first draft of “Forecasting and Analytics with ADAM” first appeared on Open Forecasting.

The creation of ADAM – next step in statistical forecasting

Ivan Svetunkov — Wed, 13 Jan 2021 11:24:18 +0000

Good news everyone! The future of statistical forecasting is finally here :). Have you ever struggled with ETS and needed explanatory variables? Have you ever needed to unite ARIMA and ETS? Have you ever needed to deal with all those zeroes in the data? What about the data with multiple seasonalities? All of this and more can now be solved by adam() function from smooth v3.0.1 package for R (on its way to CRAN now). ADAM stands for “Augmented Dynamic Adaptive Model” (I will talk about it in the next CMAF Friday Forecasting Talk on 15th January). Now, what is ADAM? Well, something like this:

The Creation of ADAM by Arne Niklas Jansson with my adaptation

ADAM is the next step in time series analysis and forecasting. Remember exponential smoothing and functions like es() and ets()? Remember ARIMA and functions like arima(), ssarima(), msarima() etc? Remember your favourite linear regression function, e.g. lm(), glm() or alm()? Well, now these three models are implemented in a unified framework. Now you can have exponential smoothing with ARIMA elements and explanatory variables in one box: adam(). You can do ETS components and ARIMA orders selection, together with explanatory variables selection in one go. You can estimate ETS / ARIMA / regression using either likelihood of a selected distribution or using conventional losses like MSE, or even using your own custom loss. You can tune parameters of optimiser and experiment with initialisation and estimation of the model. The function can deal with multiple seasonalities and with intermittent data in one place. In fact, there are so many features that it is just easier to list the major of them:

ETS;
ARIMA;
Regression;
TVP regression;
Combination of (1), (2) and either (3), or (4);
Automatic selection / combination of states for ETS;
Automatic orders selection for ARIMA;
Variables selection for regression part;
Normal and non-normal distributions;
Automatic selection of most suitable distributions;
Advanced and custom loss functions;
Multiple seasonality;
Occurrence part of the model to handle zeroes in data (intermittent demand);
Model diagnostics using plot() and other methods;
Confidence intervals for parameters of models;
Automatic outliers detection;
Handling missing data;
Fine tuning of persistence vector (smoothing parameters);
Fine tuning of initial values of the state vector (e.g. level / trend / seasonality / ARIMA components / regression parameters);
Two initialisation options (optimal / backcasting);
Provided ARMA parameters;
Fine tuning of optimiser (select algorithm and convergence criteria);
…

All of this is based on the Single Source of Error state space model, which makes ETS, ARIMA and regression directly comparable via information criteria and opens a variety of modelling and forecasting possibilities. In addition, the code is much more efficient than the code of already existing smooth functions, so hopefully this will be a convenient function to use. I do not promise that everything will work 100% efficiently from scratch, because this is a new function, which implies that inevitably there are bugs and there is a room for improvement. But I intent to continue working on it, improving it further, based on the provided feedback (you can submit an issue on github if you have ideas).

Keep in mind that starting from smooth v3.0.0 I will not be introducing new features in es(), ssarima() and other conventional functions for univariate variables in smooth – I will only fix bugs in them and possibly optimise some parts of the code, but there will be no innovations in them, given that the main focus from now on will be on adam(). To that extent, I have removed some experimental and not fully developed parameters from those functions (e.g. occurrence, oesmodel, updateX, persistenceX and transitionX).

Now, I realise that ADAM is something completely new and contains just too much information to cover in one post. As a result, I have started the work on an online textbook. This is work in progress, missing some chapters, but it already covers many important elements of ADAM. If you find any mistakes in the text or formulae, please, use the “Open Review” functionality in the textbook to give me feedback or send me a message. This will be highly appreciated, because, working on this alone, I am sure that I have made plenty of mistakes and typos.

Example in R

Finally, it would be boring just to announce things and leave it like that. So, I’ve decided to come up with an R experiments on M, M3 and tourism competitions data, similar to how I’ve done it in 2017, just to show how the function compares with the other conventional ones, measuring their accuracy and computational time:

Huge chunk of code in R

# Load the packages. If the packages are not available, install them from CRAN
library(Mcomp)
library(Tcomp)
library(smooth)
library(forecast)

# Load the packages for parallel calculation
# This package is available for Linux and MacOS only
# Comment out this line if you work on Windows
library(doMC)

# Set up the cluster on all cores / threads.
## Note that the code that follows might take around 500Mb per thread,
## so the issue is not in the number of threads, but rather in the RAM availability
## If you do not have enough RAM,
## you might need to reduce the number of threads manually.
## But this should not be greater than the number of threads your processor can do.
registerDoMC(detectCores())

##### Alternatively, if you work on Windows (why?), uncomment and run the following lines
# library(doParallel)
# cl <- detectCores()
# registerDoParallel(cl)
#####

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
    return(c(measures(holdout, object$mean, insample),
             mean(holdout < object$upper & holdout > object$lower),
             mean(object$upper-object$lower)/mean(insample),
             pinball(holdout, object$upper, 0.975)/mean(insample),
             pinball(holdout, object$lower, 0.025)/mean(insample),
             sMIS(holdout, object$lower, object$upper, mean(insample),0.95),
             object$timeElapsed))
}

# Create the list of datasets
datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)
# Give names to competing forecasting methods
methodsNames <- c("ADAM-ETS(ZZZ)","ADAM-ETS(ZXZ)","ADAM-ARIMA",
                  "ETS(ZXZ)","ETSHyndman","AutoSSARIMA","AutoARIMA");
methodsNumber <- length(methodsNames);
# Run adam on one of time series from the competitions to get names of error measures
test <- adam(datasets[[125]]);
# The array with error measures for each method on each series.
## Here we calculate a lot of error measures, but we will use only few of them
testResults <- array(NA,c(methodsNumber,datasetLength,length(test$accuracy)+6),
                             dimnames=list(methodsNames, NULL,
                                           c(names(test$accuracy),
                                             "Coverage","Range",
                                             "pinballUpper","pinballLower","sMIS",
                                             "Time")));

#### ADAM(ZZZ) ####
j <- 1;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- adam(datasets[[i]],"ZZZ");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ADAM(ZXZ) ####
j <- 2;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- adam(datasets[[i]],"ZXZ");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ADAMARIMA ####
j <- 3;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- adam(datasets[[i]], "NNN",
                 order=list(ar=c(3,2),i=c(2,1),ma=c(3,2),select=TRUE));
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ES(ZXZ) ####
j <- 4;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- es(datasets[[i]],"ZXZ");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ETS from forecast package ####
j <- 5;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- ets(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### AUTO SSARIMA ####
j <- 6;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]]);
    testForecast <- forecast(test, h=datasets[[i]]$h, interval=TRUE);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### AUTOARIMA ####
j <- 7;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.arima(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

# If you work on Windows, don't forget to shutdown the cluster via the following command:
# stopCluster(cl)

After running this code, we will get the big array (7x5315x21), which would contain many different error measures for point forecasts and prediction intervals. We will not use all of them, but instead will extract MASE and RMSSE for point forecasts and Coverage, Range and sMIS for prediction intervals, together with computational time. Although it might be more informative to look at distributions of those variables, we will calculate mean and median values overall, just to get a feeling about the performance:

A much smaller chunk of code in R

round(apply(testResults[,,c("MASE","RMSSE","Coverage","Range","sMIS","Time")],
            c(1,3),mean),3)
round(apply(testResults[,,c("MASE","RMSSE","Range","MIS","Time")],
            c(1,3),median),3)

This will result in the following two tables (boldface shows the best performing functions):

Means:
               MASE RMSSE Coverage Range  sMIS  Time
ADAM-ETS(ZZZ) 2.415 2.098    0.888 1.398 2.437 0.654
ADAM-ETS(ZXZ) 2.250 1.961    0.895 1.225 2.092 0.497
ADAM-ARIMA    2.551 2.203    0.862 0.968 3.098 5.990
ETS(ZXZ)      2.279 1.977    0.862 1.372 2.490 1.128
ETSHyndman    2.263 1.970    0.882 1.200 2.258 0.404
AutoSSARIMA   2.482 2.134    0.801 0.780 3.335 1.700
AutoARIMA     2.303 1.989    0.834 0.805 3.013 1.385

Medians:
               MASE RMSSE Range  sMIS  Time
ADAM-ETS(ZZZ) 1.362 1.215 0.671 0.917 0.396
ADAM-ETS(ZXZ) 1.327 1.184 0.675 0.909 0.310
ADAM-ARIMA    1.476 1.300 0.769 1.006 3.525
ETS(ZXZ)      1.335 1.198 0.616 0.931 0.551
ETSHyndman    1.323 1.181 0.653 0.925 0.164
AutoSSARIMA   1.419 1.271 0.577 0.988 0.909
AutoARIMA     1.310 1.182 0.609 0.881 0.322

Some things to note from this:

ADAM ETS(ZXZ) is the most accurate model in terms of mean MASE and RMSSE, it has the coverage closest to 95% (although none of the models achieved the nominal value because of the fundamental underestimation of uncertainty) and has the lowest sMIS, implying that it did better than the other functions in terms of prediction intervals;
The ETS(ZZZ) did worse than ETS(ZXZ) because the latter considers the multiplicative trend, which sometimes becomes unstable, producing exploding trajectories;
ADAM ARIMA is not performing well yet, because of the implemented order selection algorithm and it was the slowest function of all. I plan to improve it in future releases of the function;
While ADAM ETS(ZXZ) did not beat ETS from forecast package in terms of computational time, it was faster than the other functions;
When it comes to medians, auto.arima(), ets() and auto.ssarima() seem to be doing better than ADAM, but not by a large margin.

In order to see if the performance of functions is statistically different, we run the RMCB test for MASE, RMSSE and MIS. Note that RMCB compares the median performance of functions. Here is the R code:

A smaller chunk of code in R for the MCB test

# Load the package with the function
library(greybox)
# Run it for each separate measure, automatically producing plots
rmcbResultMASE <- rmcb(t(testResults[,,"MASE"]))
rmcbResultRMSSE <- rmcb(t(testResults[,,"RMSSE"]))
rmcbResultsMIS <- rmcb(t(testResults[,,"sMIS"]))

And here are the figures that we get by running that code

RMCB test for MASE

RMCB test for RMSSE

As we can see from the two figures above, ADAM-ETS(Z,X,Z) performs better than the other functions, although statistically not different than ETS implemented in es() and ets() functions. ADAM-ARIMA is the worst performing function for the moment, as we have already noticed in the previous analysis. The ranking is similar for both MASE and RMSSE.

And here is the sMIS plot:

RMCB test for sMIS

When it comes to sMIS, the leader in terms of medians is auto.arima(), doing quite similar to ets(), but this is mainly because they have lower ranges, incidentally resulting in lower than needed coverage (as seen from the summary performance above). ADAM-ETS does similar to ets() and es() in this aspect (the intervals of the three intersect).

Obviously, we could provide more detailed analysis of performance of functions on different types of data and see, how they compare in each category, but the aim of this post is just to demonstrate how the new function works, I do not have intent to investigate this in detail.

Finally, I will present ADAM with several case studies in CMAF Friday Forecasting Talk on 15th January. If you are interested to hear more and have some questions, please register on MeetUp or via LinkedIn and join us online.

Message The creation of ADAM – next step in statistical forecasting first appeared on Open Forecasting.

SMUG2019

Ivan Svetunkov — Fri, 19 Apr 2019 15:47:25 +0000

I was recently invited to attend the SMUG2019 conference (SMoothie Users Group), organised by Demand Works company in New York. They asked me to present two topics:

State space ARIMA for Supply Chain Forecasting, based on which I have developed a module for Smoothie a couple of years ago,
Artificial Intelligence in Business, one of the modern hot topics that the company wanted to know a little bit more about.

Presentation at SMUG2019

The conference was interesting, showing what the company does and what it stands for. They are doing a good job in developing the software for forecasting and inventory control and supporting their users. Plus, I finally had a chance to meet in person with both founders of the company (Bill Tonetti and Eric Townson), as well as with the other members of their team. Overall, it was a pleasant experience and an interesting event.

As for the presentations, they seemed to go well, and the participants of the conference looked satisfied. Here are the slides:

Message SMUG2019 first appeared on Open Forecasting.

A simple combination of univariate models

Ivan Svetunkov — Thu, 18 Apr 2019 08:39:17 +0000

Fotios Petropoulos and I have participated last year in M4 competition. Our approach performed well, finishing as 6th in the competition. This paper in International Journal of Forecasting explains what we used in our approach and why. Here’s the abstract:

This paper describes the approach that we implemented for producing the point forecasts and prediction intervals for our M4-competition submission. The proposed simple combination of univariate models (SCUM) is a median combination of the point forecasts and prediction intervals of four models, namely exponential smoothing, complex exponential smoothing, automatic autoregressive integrated moving average and dynamic optimised theta. Our submission performed very well in the M4-competition, being ranked 6th for the point forecasts (with a small difference compared to the 2nd submission) and prediction intervals and 2nd and 3rd for the point forecasts of the weekly and quarterly data respectively.

Paper in IJF.
Postprint of the paper.

Message A simple combination of univariate models first appeared on Open Forecasting.