14.9 Multicollinearity
While this is not an assumption about a model and can be considered as a natural phenomenon, the issue with multicollinearity is considered one of the common issues in regression analysis. An extensive discussion on this topic in that setting is provided in Section 15.3 of the Svetunkov (2022) textbook.
When it comes to dynamic models, the implications of a model with multicollinearity might differ from the ones in regression context. So, in this section we will focus our discussion on several aspects that might not be relevant to regression.
In the conventional ARIMA model (Chapter 8), multicollinearity is inevitable by construction because of the autocorrelations between actual values. This is why sometimes heteroskedasticity- and autocorrelation-consistent (HAC) estimators of the covariance matrix of parameters are used instead of the standard ones (see Section 15.4 of Hanck et al., 2020). They are designed to fix the issue and produce standard errors of parameters that are close to those without the problem. However, this typically does not impact the forecasting itself, because the covariance matrix of parameters typically plays no role in the generation of forecasts of ARIMA.
Furthermore, multicollinearity can be considered a serious issue in static models that are estimated using conventional estimators, such as OLS. However, when it comes to state space models, and specifically to ETS, multicollinearity might not cause as severe issues as in the case of regression. For example, it is possible to use all the values of a categorical variable (Section 10.5) and still avoid the trap of dummy variables. The values of a categorical variable, in this case, are considered as changes relative to the baseline. The classic example of this is the seasonal model, for example, ETS(A,A,A), where the seasonal components can be considered as a set of parameters for dummy variables, expanded from the seasonal categorical variable (e.g. months of year variable). If we set \(\gamma=0\), thus making the seasonality deterministic, the ETS can still be estimated even though all variable levels are used, in which case the conventional regression would be inestimable. This becomes apparent with the conventional ETS model, for example, from the forecast
package for R:
etsModel <- forecast::ets(AirPassengers, "AAA")
# Calculate determination coefficients for seasonal states
# These correspond to squares of multiple correlation coefficients
determ(etsModel$states[,-c(1:2)])
## s1 s2 s3 s4 s5 s6 s7 s8
## 0.9999992 0.9999992 0.9999991 0.9999991 0.9999992 0.9999992 0.9999992 0.9999991
## s9 s10 s11 s12
## 0.9999991 0.9999991 0.9999992 0.9999992
As we see, the states of the model are almost perfectly correlated, but still, the model works and does not have the issue that the classical linear regression would have. This is because the state-space models are constructed and estimated differently than the conventional regression (see discussion in Section 10).
Note however that this does not mean that ADAM will work easily in cases of strong multicollinearity of explanatory variables. The example above demonstrates that the issue is not necessarily as severe as in the regression context. Still, in some cases techniques for dimensionality reduction (such as Principle Components Analysis) might be required for ADAMX to work properly.