Risky business: how to select your model based on risk preferences

Ivan Svetunkov — Mon, 19 Jan 2026 11:28:04 +0000

What do you use for model selection? Do you select the best model based on its cross-validated performance, or do you use in-sample measures like AIC? If so, there is a way to improve your selection process further.

JORS recently published the paper of Nikos Kourentzes and I based on a simple but powerful idea: instead of using summary statistics (like the mean RMSE of cross-validated errors), you should consider the entire distribution and choose a specific quantile. This aligns with my previous post on error measures, but here is the core intuition:

The distribution of error measures is almost always asymmetric. If you only look at the average, you end up with a “mean temperature in the hospital” statistic, which doesn’t reflect how models actually behave. Some models perform great on most series but fail miserably on a few.

What can we do in this case? We can look at quantiles of distribution.

For example, if we use 84th quantile, we compare the models based on their “bad” performance, situations where they fail and produce less accurate forecasts. If you choose the best performing model there, you will end up with something that does not fail as much. So your preferences for the model become risk-averse in this situation.

If you focus on the lower quantile (e.g. 16th), you are looking at models that do well on the well-behaved series and ignore how they do on the difficult ones. So, your model selection preferences can be described as risk-tolerant, because you are accept that the best performing model might fail on a difficult time series.

Furthermore, the median (50th quantile, the middle of sample), corresponds to the risk-neutral situation, because it ignores the tails of the distribution.

What about the mean? This is a risk-agnostic strategy, because it says nothing about the performance on the difficult or easy time series – it takes everything and nothing in it at the same time, hiding the true risk profile.

So what?

In the paper, we show that using a risk-averse strategy tends to improve overall forecasting accuracy in day-to-day situations. Conversely, a risk-tolerant strategy can be beneficial when disruptions are anticipated, as standard models are likely to fail anyway.

So, next time you select a model, think about the measure you are using. If it’s just the mean RMSE, keep in mind that you might be ignoring the inherent risks of that selection.

P.S. While the discussion above applies to the distribution of error measures, our paper specifically focused on point AIC (in-sample performance). But it is a distance measure as well, so the logic explained above holds.

P.P.S. Nikos wrote a post about this paper here.

P.P.P.S. And here is the link to the paper.

Message Risky business: how to select your model based on risk preferences first appeared on Open Forecasting.

Fundamental Flaw of the Box-Jenkins Methodology

Ivan Svetunkov — Tue, 13 May 2025 11:57:07 +0000

If you have taken a course on forecasting or time series analysis, you’ve probably heard of ARIMA and the Box–Jenkins methodology. In my opinion, this methodology has a fundamental flaw and should not be used in practice. Here’s why.

When Box and Jenkins wrote their book back in the 1960s, it was a very different era: computers were massive, and people worked with punch cards. To make their approach viable, Box and Jenkins developed a methodology for selecting the appropriate orders of AR and MA based on the values of the autocorrelation and partial autocorrelation functions (ACF and PACF, respectively). Their idea was that if an ARMA process generates a specific ACF/PACF pattern, then it could be identified by analysing those functions in the data. At the time, it wasn’t feasible to do cross-validation or rolling origin evaluation, and even using information criteria for model selection was a challenge. So, the Box–Jenkins approach was a sensible option, producing adequate results with limited computational resources, and was considered state of the art.

Unfortunately, as the M1 competition later showed (see my earlier post), the methodology didn’t work well in practice. Simpler methods that didn’t rely on rigorous model selection actually performed better. But in fact, the winning model in the competition was ARARMA by Emanuel Parzen (https://doi.org/10.1002/for.3980010108). His idea was to make the series stationary by applying a low-order, non-stationary AR to the data, then extract residuals and select appropriate ARMA orders using AIC. Parzen ignored the Box–Jenkins methodology entirely – he didn’t analyse ACF or PACF and instead relied fully on automated selection. And it worked!

So why didn’t the Box–Jenkins methodology perform as expected? In my monograph Forecasting and Analytics with ADAM, I use the following example to explain the main issue: “All birds have wings. Sarah has wings. Thus, Sarah is a bird.” But Sarah, as shown in the image attached to this post, is a butterfly.

The fundamental issue with the Box–Jenkins methodology lies in its logic: if a process generates a specific ACF/PACF, that doesn’t mean that an observed ACF/PACF must come from that process. Many ARMA and even non-ARMA processes can generate exactly the same autocorrelation structure.

Further developments in ARIMA modelling have shown that ACF and PACF can only be used as general guidelines for order selection. To assess model performance properly, we need other tools. All modern approaches rely on information criteria for ARIMA order selection, and they consistently perform well in forecasting competitions. For example, Hyndman & Khandakar (2008) use AIC for ARMA order selection, while Svetunkov & Boylan (2020) apply AIC after reformulating ARIMA in a state space form. The former is implemented in the forecast package in R and the StatsForecast library in Python (thanks to Nixtla and Azul Garza); the latter is available in the smooth package in R. I also discuss another ARIMA order selection approach in Section 15.2 of my book.

Long story short: don’t use the Box–Jenkins methodology for order selection. Use more modern tools, such as information criteria.

P.S. See also my early post on ARIMA, discussing what is wrong with it.

Message Fundamental Flaw of the Box-Jenkins Methodology first appeared on Open Forecasting.

Archives Information criteria - Open Forecasting

Risky business: how to select your model based on risk preferences

Fundamental Flaw of the Box-Jenkins Methodology