14.6 Residuals are i.i.d.: heteroscedasticity

Another important assumption for conventional models is that the residuals are homoscedastic, meaning that the variance of the residuals stays the same (no matter what). If it does not then the prediction intervals from our model might be miscalibrated (either narrower or wider than needed, depending on circumstances). This section will see how the issue can be resolved in some cases.

14.6.1 Detecting heteroscedasticity

Building upon our previous example, we will use the ETSX(A,N,A) model, which has some issues, as we remember from Section 14.3. One of those is the wrong type of model – additive instead of multiplicative. This is also related to the variance of residuals. To see if they are homoscedastic, we can plot them against the fitted values (Figure 14.21):

par(mfcol=c(1,2), mar=c(4,4,2,1))
plot(adamSeat03,4:5)

The two plots in Figure 14.21 allow detecting a specific type of heteroscedasticity when the residuals variability changes with the increase of fitted values. The plot of absolute residuals vs fitted is more appropriate for models, where the scale parameter is calculated based on absolute values of residuals (e.g. the model with Laplace distribution) and relates to MAE (Subsection 11.2.1), while the squared residuals vs fitted shows whether the variance of residuals is stable or not (thus making it more suitable for models with Normal and related distributions). Furthermore, the squared residuals plot might be challenging to read due to outliers, so the first one might help detect the heteroscedasticity even when the scale is supposed to rely on squared errors. What we want to see on these plots is for all the points to lie in the same corridor for lower and for the higher fitted values and for the red line to be parallel to the x-axis. In our case, there is a slight increase in the line. Furthermore, the variability of residuals around 1000 is lower than the one around 2000, indicating that we have heteroscedasticity in residuals. In our case, this is caused by the wrong transformations in the model (see Section 14.3), so to fix the issue, we should switch to a multiplicative model.

Another diagnostics tool that might become useful in some situations is the plot of absolute and squared standardised residuals versus fitted values. They have a similar idea to the previous plots, but they might change slightly because of the standardisation (mean is equal to 0 and scale is equal to 1). These plots become especially useful if the changing variance is modelled explicitly (e.g. via a regression model or a GARCH-type of model, see discussion in Chapter 17):

par(mfcol=c(2,1), mar=c(4,4,2,0))
plot(adamSeat03,13:14)

In our case, the plots in Figure 14.22 do not give an additional message. We already know that there is slight heteroscedasticity and that we need to transform the response variable.

If we suspect that there are some specific variables that might cause heteroscedasticity, we can plot absolute or squared residuals vs these variables to see if they are indeed cause it. For example, here how we can produce a basic plot of residuals vs all explanatory variables included in the model:

spread(cbind(
adamSeat03$data[,all.vars(formula(adamSeat03))[-1]]), lowess=TRUE) The plot in Figure 14.23 can be read similarly to the plots discussed above: if we notice a change in variability of residuals or a change (increase or decrease) in the lowess lines with the change of a variable, then this might indicate that the respective variable causes heteroscedasticity. In our example, it looks like the variable law causes the most significant issue – all the other variables do not cause as substantial change in the variance as this one. We already know that we need to use a multiplicative model instead of the additive one in our example, so we will see how the residuals look for the correctly specified model in Figure 14.24. The plots in Figure 14.24 do not demonstrate any substantial issues: the residuals look homoscedastic, and given the scale of residuals, the change of lowess line does not reflect significant changes in the residuals. An additional plot of absolute residuals vs explanatory variables does not show any severe issues (Figure 14.25). So, we can conclude that the multiplicative model resolves the issue with heteroscedasticity. spread(cbind( as.data.frame(abs(log(resid(adamSeat05)))), adamSeat05$data[,all.vars(formula(adamSeat05))[-1]]),
lowess=TRUE)