This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

14.5 Residuals are i.i.d.: autocorrelation

One of the typical characteristics of time series models is the dynamic relation between variables. Even if fundamentally the sales of ice cream on Monday do not impact sales of the same ice cream on Tuesday, they might impact advertising expenses or sales of a competing product on Tuesday, Wednesday or next week. Missing this sort of structure might lead to autocorrelation of residuals, which then would impact the estimates of parameters and final forecasts. Autocorrelations might also arise due to wrong transformations of variables, where the model would systematically underforecast the actuals, producing autocorrelated residuals. In this section, we will see one of the potential ways for the regression diagnostics and try to improve the model in a stepwise fashion, trying out different order of ARIMA model.

As an example, we continue with the same seatbelts data, dropping the dynamic part to see what would happen in this case:

adamModelSeat09 <- adam(Seatbelts, "NNN", lags=12,
## [1] 2487.714

There are different ways to diagnose this model. We start with a basic plot of residuals over time:


We see that on one hand the residuals still contain seasonality and on the other that they do not look stationary. We could conduct ADF and / or KPSS test to get a formal answer to the stationarity question:

##  KPSS Test for Level Stationarity
## data:  resid(adamModelSeat09)
## KPSS Level = 0.69641, Truncation lag parameter = 4, p-value = 0.01387
## Warning in tseries::adf.test(resid(adamModelSeat09)): p-value smaller than
## printed p-value
##  Augmented Dickey-Fuller Test
## data:  resid(adamModelSeat09)
## Dickey-Fuller = -5.6449, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

The tests have opposite null hypothesis, and in our case we would reject H\(_0\) for both of them on 5% significance level. This means that they contradict each other and we need to use our judgment. First I will see what happens with the model, when we do take differences:

# ARIMAX(0,1,0)(0,0,0)_12
adamModelSeat10 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2390.028

This leads to an improvement in AICc in comparison with the previous model. The residuals of the model are now also much better behaved:


In order to see whether there are any other dynamic elements left, we will plot ACF and PACF of residuals:


In case of adam() objects, these plots will always have the range for y-axis from -1 to 1 and will start from lag 1 on x-axis. The red horizontal lines represent the “non-rejection” region: if the point lie inside the region, then they are not statistically different from zero on the selected level (the uncertainty around them is so high that it covers zero). The points with numbers are those that are statistically significantly different from zero. So, the ACF / PACF analysis might show the lags that are statistically significant on the selected level (the default one is 0.95). Given that this is a statistical instrument, we would expect for approximately (1-level)% (e.g. 5%) of lags lie outside these bounds, so it is fine if we don’t see all point lying inside them. However, we should not see any patterns there and we might need to investigate the suspicious lags (low orders of up to 3 - 5 and the seasonal lags if they appear). In our example we see that there is a suspicious lag 1 on ACF and a suspicious lag 2 on the PACF, which might indicate that some dynamic elements are missing (e.g. MA(1) or AR(2)). Furthermore, there are spikes on lag 12 for both ACF and PACF. While it is not clear, what specifically is needed here, we can try out several models and see which one of them is better in order to deremine the appropriate element:

# ARIMAX(0,1,0)(1,0,0)_12
adamModelSeat11 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2390.919
# ARIMAX(0,1,0)(0,0,1)_12
adamModelSeat12 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2405.051
# ARIMAX(0,1,0)(1,0,1)_12
adamModelSeat13 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2349.809

Based on this analysis, we would be inclined to include both seasonal AR(1) and seasonal MA(1) in the model. Next step in our iterative process - another ACF / PACF plot of the residuals:


In this case, there is a big spike on ACF for lag 1, so we can try adding MA(1) component in the model:

# ARIMAX(0,1,1)(1,0,1)_12
adamModelSeat14 <- adam(Seatbelts,"NNN",lags=12,
## [1] 2327.42

Which leads to further improvement in AICc. We could continue our investigations in order to find the most suitable ARIMAX model for the data using this iterative procedure, but this example should suffice in providing the general idea of how it can be done. What we could do else to simplify the process is to use the automated ARIMA selection algorithm in adam(), which is built on the principles discussed in this section:

adamModelSeat15 <-
## [1] 2200.467

This new constructed SARIMAX(0,1,1)(0,1,1)\(_{12}\) model has lower AICc than the previous one and should be used instead. In fact, it is even better than the ETSX(MNM) (model 5) from the previous section in terms of AICc, and its residuals are much better behaved than the ones of model 5 (we might need to analyse the residuals for the potential outliers in this model though):


So for the purposes of analytics and forecasting, we would be inclined to use SARIMAX(0,1,1)(0,1,1)\(_{12}\) rather than ETSX(MNM).

As a final word for this section, we have focused our discussion on the visual analysis of time series, ignoring the statistical tests (we only used ADF and KPSS). Yes, there is Durbin-Watson (Wikipedia, 2021f) test for AR(1) in residuals, and yes there are Ljung-Box (Wikipedia, 2021g), Box-Pierce and Breusch–Godfrey (Wikipedia, 2021h) tests for multiple AR elements. But visual inspection of time series is not less powerful than hypothesis testing. In fact, it makes you think and analyse the model and its assumptions, while the tests are the lazy way out that might lead to wrong conclusions because they have the standard limitations of any hypothesis tests. After all, if you fail to reject H\(_0\) it does not mean that the effect does not exist. Having said that, the statistical tests become extremely useful when you need to process many time series at the same time and cannot physically do visual inspection of them. So, if you are in that situation, I would recommend reading more about them, but I do not aim to retell the content of Wikipedia in this textbook.


• Wikipedia, 2021f. Durbin–Watson statistic. (version: 2021-05-13)
• Wikipedia, 2021g. Ljung–Box test. (version: 2021-05-13)
• Wikipedia, 2021h. Breusch–Godfrey test. (version: 2021-05-13)