## 14.8 Residuals are i.i.d.: distributional assumptions

Finally, we come to the distributional assumptions of ADAM. As discussed earlier (for example, in Section 11.1), the ADAM framework supports several distributions. The specific parts of assumptions will change depending on the type of error term in the model. Given that, it is relatively straightforward to see if the residuals of the model follow the assumed distribution or not. There exist several tools for that.

The simplest one is called Quantile-Quantile (QQ) plot. It produces a figure with theoretical vs actual quantiles and shows whether they are close to each other or not. Here is, for example, how the QQ plot will look for one of the previous models, assuming Normal distribution (Figure 14.27):

plot(adamSeat03, 6)

If the residuals do not contradict the assumed distribution, all the points should lie either very close to or on the line. In our case, in Figure 14.27, most points are close to the line, but the tails (especially the right one) are slightly off. This might mean that we should either use a different error type or a different distribution. Just for the sake of argument, we can try ETSX(M,N,M) model, with the same set of explanatory variables as in the model adamSeat03, and with the same Normal distribution:

adamSeat16 <- adam(Seatbelts, "MNM",
formula=drivers~log(PetrolPrice)+log(kms)+law,
distribution="dnorm")
plot(adamSeat16, 6)

According to the QQ plot in Figure 14.28, the residuals of the new model are much closer to the theoretical ones. Only the right tail has a slight deviation from normality – the actual values are a bit further away than expected. This can be addressed by using a skewed distribution, for example, Gamma:

adamSeat17 <- adam(Seatbelts, "MNM",
formula=drivers~log(PetrolPrice)+log(kms)+law,
distribution="dinvgauss")
plot(adamSeat17, 6)

The QQ plot in Figure 14.29 does not demonstrate any significant improvement in comparison with the previous model. We could use AICc to select between the two models if we are not sure, which of them to prefer:

AICc(adamSeat16)
## [1] 2405.49
AICc(adamSeat17)
## [1] 2406.554

Based on these results, we can conclude that the Normal distribution is more suitable for this situation than the Inverse Gaussian one.

Another way to analyse the distribution of residuals is to plot histogram together with the theoretical density function. Here is an example:

# Plot histogram of residuals
xlab="Residuals", main="", ylim=c(0,0.0035))
# Add density line of the theoretical distribution
lines(seq(-400,400,1),
col="red")
There are also formal tests for the distribution of residuals, such as Shapiro-Wilk and Kolmogorov-Smirnov . The former tests the hypothesis that residuals follow Normal distribution. The latter is much more flexible and allows comparing the empirical distribution with any other one (theoretical or empirical). However, I prefer to use visual inspection when possible instead of these tests because, as discussed in Section 5.3 of Svetunkov (2022a), the null hypothesis is always wrong. It will inevitably be rejected with the increase of the sample size, which does not mean that it is either correct or wrong. Besides, if you fail to reject H$$_0$$, it does not mean that your variable follows the assumed distribution. It only means that you have not found enough evidence to reject the null hypothesis.