## 14.7 Residuals are i.i.d.: zero expectation

This assumption only applies for the additive error models (Section 5.1). In the case of the multiplicative error models, it is changed to “expectation of the error term is equal to one” (Section 6.5). It does not make sense to check this assumption unconditionally because it does not mean anything in-sample: it will hold automatically for the residuals of a model in the case of OLS estimation. The observed mean of the residuals might not be equal to zero in other cases, but this does not give any helpful information. In fact, when we work with exponential smoothing models, the in-sample residuals being equal to zero might imply for some of them that the final values of components are identical to the initial ones. For example, in the case of ETS(A,N,N) (from Section 4.3), we can use the measurement equation from (4.3) to express the final value of level via the previous values up until $$t=0$$: \begin{aligned} \hat{l}_t &= \hat{l}_{t-1} + \hat{\alpha} e_t = \hat{l}_{t-2} + \hat{\alpha} e_{t-1} + \hat{\alpha} e_t = \\ & \hat{l}_0 + \hat{\alpha} \sum_{j=1}^t e_{t-j} . \end{aligned} \tag{14.3} If the mean of the residuals in-sample is indeed equal to zero, then the equation (14.3) reduces to $$\hat{l}_t=\hat{l}_0$$. So, this assumption cannot be checked in-sample, meaning that it is all about the true model and the asymptotic behaviour rather than the model applied to the data.

On the other hand, if for some reason the mean of residuals is not equal to zero in the population, then the model will change. For example, if we have ETS(A,N,N) model with the non-zero mean of residuals $$\mu_\epsilon$$, then the residuals can be represented in the form $$\epsilon_t = \mu_\epsilon + \xi_t$$, where $$\mathrm{E}(\xi_t)=0$$ which leads to a different model than ETS(A,N,N): \begin{aligned} & y_t = l_{t-1} + \mu_\epsilon + \xi_t \\ & l_t = l_{t-1} + \alpha \mu_\epsilon + \alpha \xi_t \end{aligned}. \tag{14.4} If we apply ETS(A,N,N) model to the data instead of (14.4), we will omit an important element and thus the estimated smoothing parameter will be higher than needed. The same logic applies to the multiplicative error models: the mean of residuals $$1+\epsilon_t$$ should be equal to one for them, otherwise the model would change. This phenomenon arises because of the “pull-to-centre” effect of dynamic models (ETS and ARIMA with non-zero MA terms), where due to the presence of residuals in the transition equations, the model updates the states so that they become closer to the conditional mean of data.

Summarising this discussion, the expectation of residuals of the applied ADAMs should be equal to zero asymptotically, which cannot be tested in-sample.

The more valuable part of this assumption that can be checked is whether the expectation of the residuals conditional on some variables is equal to zero (or one). In a way, this comes to ensuring that there are no patterns in the residuals and thus no parts of the data, where residuals have non-zero expectations systematically.

There are different ways to diagnose this. The first is the already discussed plot of standardised (or studentised) residuals vs fitted values from Section 14.3. The other one is the plot of residuals over time, which we have already discussed in Section 14.5. In addition, you can also plot residuals vs some of the variables to see if they cause the change in mean. But in a way, all these methods might also mean that the residuals are autocorrelated and/or some transformations of variables are needed.

There is also an effect related to this, called “endogeneity” (discussed briefly in Section 12.3 of I. Svetunkov, 2022a). According to the econometrics literature (see for example, Hanck et al., 2020), it implies that the residuals are correlated with some variables. This becomes equivalent to the situation when the expectation of residuals changes with the change of a variable. The most prominent cause of this is the omitted variables (discussed in Section 14.1), which can be sometimes diagnosed by looking at correlations between the residuals and omitted variables if the latter are available. While econometricians propose using other estimation methods (such as Instrumental Variables) to diminish the effect of endogeneity, the forecasters cannot do that because we need to fix the problem to get more reasonable forecasts rather than better estimates of parameters. Unfortunately, there is no universal recipe for the solution of this problem, but in some cases transforming variables, adding the omitted ones or substituting them by proxies (if the omitted variables are unavailable) might resolve the issue to some extent.

### 14.7.1 Multistep forecast errors have zero mean

This follows from the previous assumption if the model is correctly specified and its residuals are i.i.d. In that situation, we would expect the multiple steps ahead forecast errors to have zero mean. In practice, this might be violated if some structural changes or level shifts are not taken into account by the model. The only thing to note is that this approach requires defining the forecast horizon $$h$$. This should typically come from the task itself and the decisions made.

The diagnostics of this assumption can be done using the rmultistep() method for adam(). This method would apply the estimated model and produce multiple steps ahead forecasts from each in-sample observation to the horizon $$h$$, stacking the forecast errors by rows. Whether we use an additive or multiplicative error model, the method will report the residual $$e_t$$.

Here is an example of the code for extraction and plotting of multistep forecast errros for the multiplicative model 5 from the previous sections:

# Extract multistep errors
# Give adequate names to the columns
# Produce boxplots
points(apply(adamSeat05ResidMulti,2,mean), col="red", pch=16)