14.5 Residuals are i.i.d.: autocorrelation

One of the typical characteristics of time series models is the dynamic relation between variables. Even if fundamentally, the sales of ice cream on Monday do not impact sales of the same ice cream on Tuesday, they might impact advertising expenses or sales of a competing product on Tuesday, Wednesday or next week. Missing this structure might lead to the autocorrelation of residuals, influencing the estimates of parameters and final forecasts. This can influence the prediction intervals (causing miscalibration) and in serious cases leads to biased forecasts. Autocorrelations might also arise due to wrong transformations of variables, where the model would systematically underforecast the actuals, producing autocorrelated residuals. In this section, we will see one of the potential ways for the regression diagnostics and try to improve the model stepwise, trying out different orders of the ARIMA model (Section 9).

As an example, we continue with the same seatbelts data, dropping the dynamic part to see what would happen in this case:

adamSeat09 <- adam(Seatbelts, "NNN", lags=12,
## [1] 2651.191

There are different ways to diagnose this model. We start with a basic plot of residuals over time (Figure 14.16):

plot(adamSeat09, 8, main="")
Standardised residuals vs time for the model 9.

Figure 14.16: Standardised residuals vs time for the model 9.

We see in Figure 14.16 that on one hand the residuals still contain seasonality and on the other, they do not look stationary. We could conduct ADF and / or KPSS tests to get a formal answer to the stationarity question:

## Warning in tseries::kpss.test(resid(adamSeat09)): p-value greater than printed
## p-value
##  KPSS Test for Level Stationarity
## data:  resid(adamSeat09)
## KPSS Level = 0.12844, Truncation lag parameter = 4, p-value = 0.1
## Warning in tseries::adf.test(resid(adamSeat09)): p-value smaller than printed p-
## value
##  Augmented Dickey-Fuller Test
## data:  resid(adamSeat09)
## Dickey-Fuller = -6.8851, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

The tests have opposite null hypotheses, and in our case, we would fail to reject H\(_0\) on 1% for the KPSS test and reject H\(_0\) on 1% for the ADF, which means that the residuals look stationary. The main problem in the residuals is the seasonality, which formally makes the residuals non-stationary (their mean changes from month to month), but which cannot be detected by these tests. Yes, there is a Canova-Hansen test, which is implemented in ch.test function in uroot package in R, which tests the seasonal unit root, but I instead of trying it out and coming to conclusions, I will try the model in seasonal differences and see if it is better than the one without it:

# SARIMAX(0,0,0)(0,1,0)_12
adamSeat10 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2547.274

Remark. While in general models in differences are not comparable with the models applied to the original data, adam() allows such comparison, because ARIMA model implemented in it is initialised before the start of the sample and does not loose any observations.

This leads to an improvement in AICc in comparison with the previous model. The residuals of the model are now also better behaved (Figure 14.17):

plot(adamSeat10, 8, main="")
Standardised residuals vs time for the model 10.

Figure 14.17: Standardised residuals vs time for the model 10.

In order to see whether there are any other dynamic elements left, we will plot ACF and PACF (discussed in Subsections 8.3.2 and 8.3.3) of residuals (Figure 14.18):

par(mfcol=c(1,2), mar=c(4,4,0,1))
plot(adamSeat10, 10:11, level=0.99, main="")
ACF and PACF of residuals of the model 10.

Figure 14.18: ACF and PACF of residuals of the model 10.

In the case of the adam() objects, these plots, by default, will always have the range for the y-axis from -1 to 1 and will start from lag one on the x-axis. The red horizontal lines represent the “non-rejection” region. If a point lies inside the region, it is not statistically different from zero on the selected confidence level (the uncertainty around it is so high that it covers zero). The points with numbers are those that are statistically significantly different from zero. So, the ACF / PACF analysis might show the statistically significant lags on the selected level (the default one is 0.95). Given that this is a statistical instrument, we expect that approximately (1-level)% (e.g. 1%) of lags lie outside these bounds just due to randomness, even if the null hypothesis is correct. So it is okay if we do not see all points lying inside them. However, we should not see any patterns there, and we might need to investigate the suspicious lags (low orders of up to 3 – 5 and the seasonal lags if they appear). In our example in Figure 14.18, we see that there are spikes in lag 12 for both ACF and PACF, which means that we have missed the seasonal element in the data. There is also a suspicious lag 1 on PACF and lags 1 and 2 on ACF, which could potentially indicate that MA(1) and/or AR(1), AR(2) elements are needed in the model. While it is not clear what specifically is needed here, we can try out several models and see which one is better to determine the appropriate order of ARIMA. We should start with the seasonal part of the model, as it might obscure the non-seasonal one.

# SARIMAX(0,0,0)(1,1,0)_12
adamSeat11 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2534.39
# SARIMAX(0,0,0)(0,1,1)_12
adamSeat12 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2475.387
# SARIMAX(0,0,0)(1,1,1)_12
adamSeat13 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2542.092

Based on this analysis, we would be inclined to include in the model seasonal MA(1) only. Next step in our iterative process – another ACF / PACF plot of the residuals:

ACF and PACF of residuals of the model 12.

Figure 14.19: ACF and PACF of residuals of the model 12.

In this case, there is a spike on PACF for lag 1 and a textbook exponential decrease in ACF starting from lag 1, which might mean that we need to include AR(1) component in the model:

# ARIMAX(1,0,0)(0,1,1)_12
adamSeat14 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2432.044

Choosing between the new model and the old one, we should give preference to the model 14, which has a lower AICc than model 12. So, adding MA(1) to the model leads to further improvement in AICc compared to the previous models. Using this iterative procedure, we could continue our investigation to find the most suitable SARIMAX model for the data. Nevertheless, this example should suffice in providing a general idea of how it can be done. What we could do else to simplify the process is to use the automated ARIMA selection algorithm (see discussion in Section 15.2) in adam(), which is built on the principles discussed in this section:

adamSeat15 <-
## [1] "SARIMAX(0,1,1)[1](0,1,1)[12]"
## [1] 2426.211

This new constructed model is SARIMAX(1,0,0)(0,1,1)\(_{12}\), which is the model we achieved manually. Note that it is even better than the ETSX(MNM) (model 5) from the previous section in terms of AICc. Its residuals are much better behaved than the ones of model 5 (although we might need to analyse the residuals for the potential outliers, as discussed in Subsection 14.4):

par(mfcol=c(3,1), mar=c(4,4,2,1))
plot(adamSeat15, c(8,10:11), level=0.99)
Residuals of the model 16.

Figure 14.20: Residuals of the model 16.

So for the purposes of analytics and forecasting, we would be inclined to use SARIMAX(2,0,0)(0,1,1)\(_{12}\) rather than ETSX(MNM).

As a final word for this section, we have focused our discussion on the visual analysis of time series, ignoring the statistical tests (we only used ADF and KPSS). Yes, there is Durbin-Watson (Wikipedia, 2021f) test for AR(1) in residuals, and yes, there are Ljung-Box (Wikipedia, 2021g), Box-Pierce and Breusch–Godfrey (Wikipedia, 2021h) tests for multiple AR elements. But visual inspection of time series is not less powerful than hypothesis testing. It makes you think and analyse the model and its assumptions, while the tests are the lazy way out that might lead to wrong conclusions because they have the standard limitations of any hypothesis tests (as discussed in Section 7.1 of Svetunkov, 2022a). After all, if you fail to reject H\(_0\), it does not mean that the effect does not exist. Having said that, the statistical tests become extremely useful when you need to process many time series simultaneously and cannot inspect them manually. So, if you are in that situation, I would recommend reading more about them, but I do not aim to retell the content of Wikipedia in this monograph.


• Svetunkov, I., 2022a. Statistics for business analytics. https://openforecast.org/sba/ (version: 31.10.2022)
• Wikipedia, 2021f. Durbin–Watson statistic. https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic (version: 2021-05-13)
• Wikipedia, 2021g. Ljung–Box test. https://en.wikipedia.org/wiki/Ljung%E2%80%93Box_test (version: 2021-05-13)
• Wikipedia, 2021h. Breusch–Godfrey test. https://en.wikipedia.org/wiki/Breusch%E2%80%93Godfrey_test (version: 2021-05-13)