Fundamental Flaw of the Box-Jenkins Methodology

If you have taken a course on forecasting or time series analysis, you’ve probably heard of ARIMA and the Box–Jenkins methodology. In my opinion, this methodology has a fundamental flaw and should not be used in practice. Here’s why.

When Box and Jenkins wrote their book back in the 1960s, it was a very different era: computers were massive, and people worked with punch cards. To make their approach viable, Box and Jenkins developed a methodology for selecting the appropriate orders of AR and MA based on the values of the autocorrelation and partial autocorrelation functions (ACF and PACF, respectively). Their idea was that if an ARMA process generates a specific ACF/PACF pattern, then it could be identified by analysing those functions in the data. At the time, it wasn’t feasible to do cross-validation or rolling origin evaluation, and even using information criteria for model selection was a challenge. So, the Box–Jenkins approach was a sensible option, producing adequate results with limited computational resources, and was considered state of the art.

Unfortunately, as the M1 competition later showed (see my earlier post), the methodology didn’t work well in practice. Simpler methods that didn’t rely on rigorous model selection actually performed better. But in fact, the winning model in the competition was ARARMA by Emanuel Parzen (https://doi.org/10.1002/for.3980010108). His idea was to make the series stationary by applying a low-order, non-stationary AR to the data, then extract residuals and select appropriate ARMA orders using AIC. Parzen ignored the Box–Jenkins methodology entirely – he didn’t analyse ACF or PACF and instead relied fully on automated selection. And it worked!

So why didn’t the Box–Jenkins methodology perform as expected? In my monograph Forecasting and Analytics with ADAM, I use the following example to explain the main issue: “All birds have wings. Sarah has wings. Thus, Sarah is a bird.” But Sarah, as shown in the image attached to this post, is a butterfly.

The fundamental issue with the Box–Jenkins methodology lies in its logic: if a process generates a specific ACF/PACF, that doesn’t mean that an observed ACF/PACF must come from that process. Many ARMA and even non-ARMA processes can generate exactly the same autocorrelation structure.

Further developments in ARIMA modelling have shown that ACF and PACF can only be used as general guidelines for order selection. To assess model performance properly, we need other tools. All modern approaches rely on information criteria for ARIMA order selection, and they consistently perform well in forecasting competitions. For example, Hyndman & Khandakar (2008) use AIC for ARMA order selection, while Svetunkov & Boylan (2020) apply AIC after reformulating ARIMA in a state space form. The former is implemented in the forecast package in R and the StatsForecast library in Python (thanks to Nixtla and Azul Garza); the latter is available in the smooth package in R. I also discuss another ARIMA order selection approach in Section 15.2 of my book.

Long story short: don’t use the Box–Jenkins methodology for order selection. Use more modern tools, such as information criteria.

P.S. See also my early post on ARIMA, discussing what is wrong with it.

Open Forecasting

Leave a Reply Cancel reply