\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 14 Model diagnostics

In this chapter, we investigate how ADAM can be diagnosed and improved. Most topics will build upon the typical model assumptions discussed in Subsection 1.4.1 and in Chapter 15 of Svetunkov (2022). Some of the assumptions cannot be diagnosed properly, but there are well-established instruments for others. All the assumptions about statistical models can be summarised as follows:

  1. Model is correctly specified:
    1. No omitted variables;
    2. No redundant variables;
    3. The necessary transformations of the variables are applied;
    4. No outliers in the residuals of the model.
  2. Residuals are i.i.d.:
    1. They are not autocorrelated;
    2. They are homoscedastic;
    3. The expectation of residuals is zero, no matter what;
    4. The residuals follow the specified distribution;
    5. The distribution of residuals does not change over time.
  3. The explanatory variables are not correlated with anything but the response variable:
    1. No multicollinearity;
    2. No endogeneity (not discussed in the context of ADAM).

Technically speaking, (3) is not an assumption about the model, it is just a requirement for the estimation to work correctly. In regression context, the satisfaction of these assumptions implies that the estimates of parameters are efficient and unbiased (respectively for (3a) and (3b)).

In general, all model diagnostics are aimed at spotting patterns in residuals. If there are patterns, then some assumption is violated and something is probably missing in the model. In this chapter, we will discuss which instruments can be used to diagnose different types of violations of assumptions.

Remark. The analysis carried out in this chapter is based mainly on visual inspection of various plots. While there are statistical tests for some assumptions, we do not discuss them here. This is because in many cases human judgment is at least as good as automated procedures (Petropoulos et al., 2018b), and people tend to misuse the latter (Wasserstein and Lazar, 2016). So, if you can spend time on improving the model for a specific data, the visual inspection will typically suffice.

To make this more actionable, we will consider a conventional regression model on Seatbelts data, discussed in Section 10.6. We start with a pure regression model, which can be estimated equally well with the adam() function from the smooth package or the alm() from the greybox in R. In general, I recommend using alm() when no dynamic elements are present in the model (or only AR(p) and/or I(d) are needed). Otherwise, you should use adam() in the following way:

adamSeat01 <- adam(Seatbelts, "NNN",
                   formula=drivers~PetrolPrice+kms)
plot(adamSeat01, 7, main="")
Basic regression model for the data on road casualties in Great Britain 1969–1984.

Figure 14.1: Basic regression model for the data on road casualties in Great Britain 1969–1984.

This model has several issues, and in this chapter, we will discuss how to diagnose and fix them.

References

• Petropoulos, F., Kourentzes, N., Nikolopoulos, K., Siemsen, E., 2018b. Judgmental Selection of Forecasting Models. Journal of Operations Management. 60, 34–46. https://doi.org/10.1016/j.jom.2018.05.005
• Svetunkov, I., 2022. Statistics for business analytics. https://openforecast.org/sba/ version: 31.10.2022
• Wasserstein, R.L., Lazar, N.A., 2016. The ASA’s Statement on p-Values: Context, Process, and Purpose. American Statistician. 70, 129–133. https://doi.org/10.1080/00031305.2016.1154108