3.3 Rolling origin
The text in this section is based on the vignette for the greybox package, written by the author of this textbook.
When there is a need to select the most appropriate forecasting model or method for the data, the forecasters usually split the available sample into two parts: in-sample (aka "training set") and holdout sample (or out-sample, or "test set"). The model is then estimated on in-sample and its forecasting performance is evaluated using some error measure on the holdout sample.
If such a procedure done only once, then this is called "fixed origin" evaluation. However, the time series might contain outliers or level shifts and a poor model might perform better than the more appropriate one only because of that. In order to robustify the evaluation of models, something called "rolling origin" is used.
Rolling origin is an evaluation technique according to which the forecasting origin is updated successively and the forecasts are produced from each origin (Tashman 2000). This technique allows obtaining several forecast errors for time series, which gives a better understanding of how the models perform. This can be considered as a time series analogue for cross-validation techniques (Wikipedia 2020b). Here is a simple graphical representation of it, provided to me by Nikos Kourentzes.
There are different options of how this can be done.
3.3.1 Principles of Rolling origin
The figure 3.2 (Svetunkov and Petropoulos 2018) depicts the basic idea of rolling origin. White cells correspond to the in-sample data, while the light grey cells correspond to the three-steps-ahead forecasts. Time series has 25 observations in that figure, and the forecasts are produced from 8 origins, starting from the origin 15. The model is re-estimated on each iteration, and the forecasts are produced. After that a new observation is added at the end of the series and the procedure continues. The process stops when there is no more data to add. This could be considered as a rolling origin with a constant holdout sample size. As a result of this procedure 8 one to three steps ahead forecasts are produced. Based on them we can calculate the preferred error measures and choose the best performing model.
Another option of producing forecasts from 8 origins would be to start from the origin 17 instead of 15, as shown on Figure 3.3. In this case the procedure continues until origin 22, when the last three-steps-ahead forecast is produced, and then continues with the decreasing forecasting horizon. So the two-steps-ahead forecast is produced from the origin 23 and only one-step-ahead forecast is produced from the origin 24. As a result we obtain 8 one-step-ahead forecasts, 7 two-steps-ahead forecasts and 6 three-steps-ahead forecasts. This can be considered as a rolling origin with a non-constant holdout sample size. This can be useful in cases of small samples, when we don't have any observations to spare.
Finally, in both of the cases above we had the increasing in-sample size. However for some research purposes we might need a constant in-sample. The figure 3.4 demonstrates such situation. In this case, on each iteration we add an observation at the end of the series and remove one from the beginning of the series (dark grey cells).
3.3.2 Rolling origin in R
greybox package (written by Yves Sagaert and Ivan Svetunkov in 2016 on the way to the International Symposium on Forecasting) implements the rolling origin evaluation for any function you like with a predefined
call and returns the desired
value. It heavily relies on the two variables:
value - so it is quite important to understand how to formulate them in order to get the desired results. Overall,
ro() is a very flexible function, but, as a result, it is not very simple. In this subsection we will see how it work on a couple of examples.
We start with a simple example, generating series from normal distribution:
x <- rnorm(100,100,10)
We use ARIMA(0,1,1) model implemented in
ourCall <- "predict(arima(x=data,order=c(0,1,1)),n.ahead=h)"
The call that we specify includes two important elements:
data specifies where the in-sample values are located in the function that we want to use, and it needs to be called "data" in the call.
h will tell our function, where the forecasting horizon is specified in the selected function. Note that in this example we use
arima(x=data,order=c(0,1,1)), which produces a desired ARIMA(0,1,1) model and then we use
predict(..., n.ahead=h), which produces an h steps ahead forecast from that model.
Having the call, we need also to specify what the function should return. This can be the conditional mean (point forecasts), prediction intervals, the parameters of a model, or, in fact, anything that the model returns (e.g. name of the fitted model and its likelihood). However, there are some differences in what the
ro() returns depending on what the function returns. If it is a vector, then
ro() will produce a matrix (with values for each origin in columns). If it is a matrix, then an array is returned. Finally, if it is a list, then a list of lists is returned.
In order not to overcomplicate things, we will collect the conditional mean from the
ourValue <- c("pred")
NOTE: If you do not specify the value to return, the function will try to return everything, but it might fail, especially if a lot of values are returned. So, in order to be on the safe side, always provide the
value, when possible.
Now that we have specified
ourValue, we can produce forecasts from the model using rolling origin. Let's say that we want three-steps-ahead forecasts and 8 origins with the default values of all the other parameters:
returnedValues1 <- ro(x, h=3, origins=8, call=ourCall, value=ourValue)
The function returns a list with all the values that we asked for plus the actual values from the holdout sample. We can calculate some basic error measure based on those values, for example, scaled Mean Absolute Error (Petropoulos and Kourentzes 2015):
apply(abs(returnedValues1$holdout - returnedValues1$pred),1,mean,na.rm=TRUE) / mean(returnedValues1$actuals)
## h1 h2 h3 ## 0.04669208 0.04684055 0.03913527
In this example we use
apply() function in order to distinguish between the different forecasting horizons and have an idea of how the model performs for each of them. These numbers do not tell us much on their own, but if we compared the performance of this model with another one, then we could infer if one model is more appropriate for the data than the other one. For example, applying ARIMA(1,1,2) to the same data, we will get:
ourCall <- "predict(arima(x=data,order=c(1,1,2)),n.ahead=h)" returnedValues2 <- ro(x, h=3, origins=8, call=ourCall, value=ourValue)
## Warning in arima(x = data, order = c(1, 1, 2)): possible convergence problem: ## optim gave code = 1 ## Warning in arima(x = data, order = c(1, 1, 2)): possible convergence problem: ## optim gave code = 1 ## Warning in arima(x = data, order = c(1, 1, 2)): possible convergence problem: ## optim gave code = 1 ## Warning in arima(x = data, order = c(1, 1, 2)): possible convergence problem: ## optim gave code = 1 ## Warning in arima(x = data, order = c(1, 1, 2)): possible convergence problem: ## optim gave code = 1
apply(abs(returnedValues2$holdout - returnedValues2$pred),1,mean,na.rm=TRUE) / mean(returnedValues2$actuals)
## h1 h2 h3 ## 0.04532243 0.04547505 0.03962289
Comparing these errors with the ones from the previous model, we can conclude, which of the approaches is more adequate for the data.
We can also plot the forecasts from the rolling origin, which shows how the selected model behaves:
par(mfcol=c(2,1)) plot(returnedValues1) plot(returnedValues2)
In this example the forecasts from different origins are close to each other. This is because the data is stationary and the model is quite stable.
The rolling origin function from
greybox package also allows working with explanatory variables and returning prediction intervals, if needed. Some further examples are discussed in the vignette of the package:
Petropoulos, Fotios, and Nikolaos Kourentzes. 2015. “Forecast combinations for intermittent demand.” Journal of the Operational Research Society 66 (6): 914–24. doi:10.1057/jors.2014.62.
Svetunkov, Ivan, and Fotios Petropoulos. 2018. “Old dog, new tricks: a modelling view of simple moving averages.” International Journal of Production Research 56 (18). Taylor & Francis: 6034–47. doi:10.1080/00207543.2017.1380326.
Tashman, Leonard J. 2000. “Out-of-sample tests of forecasting accuracy: An analysis and review.” International Journal of Forecasting 16 (4): 437–50. doi:10.1016/S0169-2070(00)00065-0.
Wikipedia. 2020b. “Cross-Validation (Statistics).” Wikipedia. https://en.wikipedia.org/wiki/Cross-validation_(statistics).