## 18.2 Conditional moments and scale

We have already discussed how to obtain conditional expectation and variance in Sections 5.3 and 6.3. However, the topic is worth discussing in more detail, especially for non-normal distributions.

### 18.2.1 Conditional expectation

The general rule that applies to ADAM in terms of generating conditional expectations is that if you deal with the pure additive model, then you can produce forecasts analytically. This not only applies to ETS but also to ARIMA (Subsection 9.2.1) and regression (Section 10.2). If the model has multiplicative components (such as multiplicative error or trend or seasonality) or is formulated in logarithms (as, for example, ARIMA in logarithms), then simulations (Section 18.1) should be preferred - the point forecasts from these models would not necessarily correspond to the conditional expectations.

### 18.2.2 Explanatory variables

If the model contains explanatory variables, then the h steps ahead conditional expectations should use them in the calculation. The main challenge in this situation is that future values might not be known in some cases. This has been discussed in Section 10.2. Practically speaking, if the user provides the holdout sample values of explanatory variables, the forecast.adam() method will use them in forecasting. If they are not provided, the function will produce forecasts for each of the explanatory variables via the adam() function and use the conditional h steps ahead expectations in forecasting.

### 18.2.3 Conditional variance and scale

Similar to conditional expectations, as we have discussed in Sections 5.3 and 6.3, the conditional h steps ahead variance is in general available only for the pure additive models. While the conditional expectation might be required on its own to use as a point forecast, the conditional variance is typically needed to produce prediction intervals. However, it becomes useful only in cases of distributions that support convolution (addition of random variables), which limits its usefulness to pure additive models and to additive models applied to the data in logarithms. For example, if we deal with Inverse Gaussian distribution, then the h-steps-ahead values will not follow Inverse Gaussian distribution, and we would need to revert to simulations in order to obtain the proper statistics for it. Another situation would be a multiplicative error model that relies on Normal distribution - the product of Normal distributions is not a Normal distribution, so the statistics would need to be obtained using simulations again.

If we deal with pure additive model with either Normal, Laplace, S or Generalised Normal distributions, then the formulae derived in Section 5.3 can be used to produce h-steps-ahead conditional variance. Having obtained those values, we can then produce conditional h-steps-ahead scales for the distributions (which would be needed, for example, to generate quantiles from these distributions), using the relations between the variance and scale in those distributions (discussed in Section 5.5):

1. Normal: scale is $$\sigma^2_h$$;
2. Laplace: $$s_h = \sigma_h \sqrt{\frac{1}{2}}$$;
3. S: $$s_h = \sqrt{\sigma_h}\sqrt{\frac{1}{120}}$$;
4. Generalised Normal: $$s_h = \sigma_h \sqrt{\frac{\Gamma(1/\beta)}{\Gamma(3/\beta)}}$$.

If the variance is needed for the other combinations of model/distributions, simulations would need to be done to produce multiple trajectories, similar to how it was done in Section 18.1. An alternative to this would be the calculation of in-sample multistep forecast errors (similar to how it was discussed in Sections 11.3 and 14.7.1). Having the matrix of forecast errors, we can then calculate variance for each horizon $$h$$.

### 18.2.4 Scale model

In the case of the scale model (Chapter 17), the situation becomes more complicated because we no longer assume that the variance of the error term is constant (homoscedastic) – we now assume that it is a model on its own. In this case, we need to take a step back to the recursion (5.10) and when taking the variance, introduce the time-varying variance $$\sigma_{t+h}^2$$.

Remark. Note the difference between $$\sigma_{t+h}^2$$ and $$\sigma_{h}^2$$ in our notations - the former is the variance of the error term for the specific step $$t+h$$, while the latter is the conditional variance $$h$$ steps ahead, which is derived based on the assumption of homoscedasticity.

Making that substitution leads to the following analytical formula for the h-steps-ahead conditional variance in the case of the scale model: $\begin{equation} \text{V}(y_{t+h}|t) = \sum_{i=1}^d \left(\mathbf{w}_{m_i}^\prime \sum_{j=1}^{\lceil\frac{h}{m_i}\rceil-1} \mathbf{F}_{m_i}^{j-1} \mathbf{g}_{m_i} \mathbf{g}^\prime_{m_i} (\mathbf{F}_{m_i}^\prime)^{j-1} \mathbf{w}_{m_i} \sigma_{t+h-j}^2 \right) + \sigma_{t+h}^2 . \tag{18.1} \end{equation}$ This variance can then be used, for example, to produce quantiles from the assumed distribution.

As mentioned above, in the case of the not purely additive model or model with other distributions than Normal, Laplace, S or Generalised Normal, the conditional variance can be obtained using simulations. In the case of the scale model, the principles will be the same, just assuming that each error term $$\epsilon_{t+h}$$ has its own scale, obtained from the estimated scale model. The rest of the logic will be exactly the same as discussed in Section 18.1.