17.1 Model formulation

\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

In order to understand better how the more general ADAM scale model works, we start the discussion with an example of a pure additive ETS model with Normal distribution for location and an ETS model for the scale.

17.1.1 An example with local level model with Normal distribution

Consider an ETS(A,N,N) model (which was discussed in Section 4.3), which has the following measurement equation: \[\begin{equation} y_t = l_{t-1} + \epsilon_t, \tag{17.1} \end{equation}\] where the most commonly used assumption for the error term is: \[\begin{equation*} \epsilon_t \sim \mathcal{N}(0, \sigma^2) . \end{equation*}\] The same error term can be represented as a multiplication of the Standard Normal variable by the standard deviation: \[\begin{equation} \epsilon_t = \sigma \eta_t, \tag{17.2} \end{equation}\] where \(\eta_t \sim \mathcal{N}(0, 1)\). Now consider the situation when instead of the constant variance \(\sigma^2\) we have one that changes over time either because of its own dynamics or because of the influence of explanatory variables. In that case we will add the subscript \(t\) to the standard deviation in (17.2) to have: \[\begin{equation} \epsilon_t = \sigma_t \eta_t. \tag{17.3} \end{equation}\] The thing to keep in mind is that in the case of Normal distribution, the scale is equal to variance rather than to the standard deviation, so in the modelling, we should consider \(\sigma_t^2\) rather than \(\sigma_t\). The variance, in this case, can be modelled explicitly using any ADAM. However, the pure multiplicative ones make more sense because they guarantee that the variance will not become zero or even negative. For demonstration purposes, we use ETS(M,N,N) to model the variance: \[\begin{equation} \begin{aligned} &\epsilon_t^2 = \sigma_t^2 \eta_t^2 \\ &\sigma_t^2 = l_{\sigma,t-1} \\ &l_{\sigma,t} = l_{\sigma,t-1} \left(1 + \alpha_\sigma (\eta_t^2-1)\right) \end{aligned}, \tag{17.4} \end{equation}\] Note that although the part \(\left(1 + \alpha_\sigma (\eta_t^2-1)\right)\) looks slightly different than the respective part \(\left(1 + \alpha \epsilon_t \right)\) in the conventional ETS(M,N,N), they are equivalent: if we substitute \(\xi_t = \eta_t^2-1\) in (17.4), we will arrive at the conventional ETS(M,N,N) model. Furthermore, because \(\eta_t\) follows Standard Normal distribution, its square will follow Chi-squared distribution: \(\eta_t^2 \sim \chi^2(1)\). This can be used for model diagnostics. Finally, we can use all the properties of the pure multiplicative models discussed in Chapter 6 to get the fitted values and forecasts from the model (17.4): \[\begin{equation} \begin{aligned} &\sigma_{t|t-1}^2 = l_{\sigma,t-1} \\ &\sigma_{t+h|t}^2 = l_{\sigma,t} \end{aligned}. \tag{17.5} \end{equation}\] In order to construct this model, we need to collect the residuals \(e_t\) of the location model (17.1), square them, and use them in the following system of equations: \[\begin{equation} \begin{aligned} &\hat{\sigma}^2_{t} = \hat{l}_{\sigma,t-1} \\ &\hat{\eta}_t^2 = \frac{e_t^2}{\hat{\sigma}^2_{t}} \\ &\hat{l}_{\sigma,t} = \hat{l}_{\sigma,t-1} (1 + \hat{\alpha}_\sigma (\hat{\eta}_t^2-1)) \end{aligned}. \tag{17.6} \end{equation}\] In order for this to work, we need to estimate \(\hat{l}_{\sigma,0}\) and \(\hat{\alpha}_\sigma\), which can be done in the conventional way by maximising the log-likelihood function of the Normal distribution (see Section 11.1) – the only thing that will change in comparison with the conventional estimation of the location model is the fitted values of \(\hat{\sigma}^2_{t}\) for the variance, which will need to be generated from (17.6): \[\begin{equation} \ell(\boldsymbol{\theta}, {\sigma}_t^2 | \mathbf{y}) = -\frac{T}{2} \log(2 \pi \sigma_t^2) -\frac{1}{2} \sum_{t=1}^T \frac{\epsilon_t^2}{\sigma_t^2} . \tag{17.7} \end{equation}\] The thing to keep in mind is that the final ETS(A,N,N) model (17.1) with the scale model (17.4) will have four parameters instead of three (as it would in case of a simpler model): two for the location part of the model (the initial level \(l_{0}\) and the smoothing parameter \(\alpha\)) and two for the scale part of the model (the initial level \(l_{\sigma,0}\) and the smoothing parameter \(\alpha_\sigma\)).

Remark. The estimation of the scale model can be done after the estimation of the location one: given that the scale and location of distribution are assumed to be independent, the likelihood can be maximised first by estimating the parameters of the model (17.1) and then (17.4). This also means that identifying whether the scale model is needed at all and if yes, what components and variables it should have, can be done automatically via information criteria.

As can be seen from this example, constructing and estimating the scale model for ADAM is not a difficult task. Following similar principles, we can use any other pure multiplicative model for scale, including ETS(Y,Y,Y) (Chapter 6), log ARIMA (Section 9.1.4), or a multiplicative regression. Furthermore, the location model ETS(A,N,N) can be substituted by any other ETS/ARIMA/Regression model – the same principles as discussed in this Subsection can be applied equally effectively to them. The only restriction in all of this is that both parts of the model should be estimated via maximisation of likelihood – other methods typically do not explicitly estimate the scale of distribution.

17.1.2 General case with Normal distribution

More generally speaking, a scale model can be created for any ETS/ARIMA/Regression location model. Following from the discussion in Section 7.1, I remind here that the general location model can be written as: \[\begin{equation*} \begin{aligned} {y}_{t} = & w(\mathbf{v}_{t-\boldsymbol{l}}) + r(\mathbf{v}_{t-\boldsymbol{l}}) \epsilon_t \\ \mathbf{v}_{t} = & f(\mathbf{v}_{t-\boldsymbol{l}}) + g(\mathbf{v}_{t-\boldsymbol{l}}) \epsilon_t \end{aligned} \end{equation*}\] If we assume that \(\epsilon_t \sim \mathcal{N}(0,\sigma^2_t)\), then the general scale model with ETS+ARIMA+Regression elements can be formulated as (Section 6.1, Subsection 9.1.4 and Section 10.3): \[\begin{equation} \begin{aligned} & \epsilon_t^2 = \sigma_t^2 \eta_{t}^2 \\ & \sigma_t^2 = \exp \left(\mathbf{w}_E^\prime \log(\mathbf{v}_{E,\sigma,t-\boldsymbol{l}_E}) + \mathbf{w}_A^\prime \log(\mathbf{v}_{A,\sigma,t-\boldsymbol{l}_A}) + \mathbf{a}_t \mathbf{x}_t \right)\\ & \log \mathbf{v}_{E,\sigma,t} = \mathbf{F}_{E,\sigma} \log \mathbf{v}_{E,\sigma,t-\boldsymbol{l}_E} + \log(\mathbf{1}_k + \mathbf{g}_{E\sigma} (\eta_t^2-1))\\ & \log \mathbf{v}_{A,\sigma,t} = \mathbf{F}_{A,\sigma} \log \mathbf{v}_{A,\sigma,t-\boldsymbol{l}_A} + \mathbf{g}_{A,\sigma} \log(\eta_t^2) \\ & \mathbf{a}_{t} = \mathbf{a}_{t-1} + \mathbf{z}_t \mathbf{g}_{R,\sigma} \log(\eta_t^2) \end{aligned}, \tag{17.8} \end{equation}\] where subscripts \(E\), \(A\), and \(R\) denote components of ETS, ARIMA, and Regression respectively. The main thing that unites all the parts of the model is that they should be pure multiplicative in order to avoid potential issues with negative numbers. The construction and estimation of the scale model in this case becomes similar to the one discussed for the ETS(A,N,N) example above. When it comes to the forecasting of the conditional \(h\) steps ahead scale, given the limitations of the pure multiplicative model discussed in Section 6.3, it needs to be obtained via simulations – this way the forecast from the ADAM will coincide with the expectation, which in the case of (17.8) will give the conditional \(h\) steps ahead scale. All the principles discussed in Sections 6.1, 9.1.4 and 10.3 can be used directly for the scale model without any limitations, and the estimation of the model can be done via maximisation of likelihood as shown in the example above (equation (17.7)).

Finally, ADAM can be expanded even further by introducing the occurrence part of the model (i.e. dealing with the time-varying scale of distribution in case of intermittent demand, discussed in Chapter 13). This part will need to be introduced in the location model, while the scale model (17.8) can be used as-is in this case, applying it to the non-zero observations.

17.1.3 Other distributions

The examples above focused on the Normal distribution, but ADAM supports other distributions as well. Depending on the error term, these are:

Additive error term (Section 5.5):
1. Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma_t^2)\);
2. Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s_t)\);
3. S: \(\epsilon_t \sim \mathcal{S}(0, s_t)\);
4. Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s_t, \beta)\);
5. Log Normal: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \text{log}\mathcal{N}\left(-\frac{\sigma_t^2}{2}, \sigma_t^2\right)\);
6. Inverse Gaussian: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{IG}(1, \sigma_t^2)\);
7. Gamma: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{\Gamma}(\sigma_t^{-2}, \sigma_t^2)\);
Multiplicative error term (Sections 6.5):
1. Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma_t^2)\);
2. Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s_t)\);
3. S: \(\epsilon_t \sim \mathcal{S}(0, s_t)\);
4. Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s_t, \beta)\);
5. Log Normal: \(\left(1+\epsilon_t \right) \sim \mathrm{log}\mathcal{N}\left(-\frac{\sigma_t^2}{2}, \sigma_t^2\right)\);
6. Inverse Gaussian: \(\left(1+\epsilon_t \right) \sim \mathcal{IG}(1, \sigma_t^2)\);
7. Gamma: \(\left(1+\epsilon_t \right) \sim \Gamma (\sigma^{-2}, \sigma_t^2)\).

The error terms in these cases can also be presented in a form similar to (17.3) to get the first equation in (17.8) for the respective distributions:

Additive error term:
1. Normal: \(\epsilon_t^2 = \sigma_t^2 \eta_t^2\), where \(\eta_t \sim \mathcal{N}(0, 1)\) or accidentally \(\eta_t^2 \sim \chi^2(1)\);
2. Laplace: \(|\epsilon_t| = s_t |\eta_t|\), where \(\eta_t \sim \mathcal{Laplace}(0, 1)\);
3. S: \(0.5 |\epsilon_t|^{0.5} = s_t |\eta_t|^{0.5}\), where \(\eta_t \sim \mathcal{S}(0, 1)\);
4. Generalised Normal: \(\beta |\epsilon_t|^{\beta} = s_t |\eta_t|^{\beta}\), where \(\eta_t \sim \mathcal{GN}(0, 1, \beta)\);
5. Log Normal: \(\log\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) = \sigma_t \eta_t-\frac{\sigma_t^2}{2}\), where \(\eta_t \sim \mathcal{N}(0, 1)\);
6. Inverse Gaussian: \(\frac{\left(\frac{\epsilon_t}{\mu_{y,t}} \right)^2}{\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right)}=\sigma^2_t \eta_t^2\), where \(\eta_t^2 \sim \chi^2(1)\);
7. Gamma: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) = \sigma_t^2 \eta_t\), so that \(\eta_t \sim \mathcal{\Gamma}(\sigma_t^{-2}, 1)\);
Multiplicative error term (Sections 6.5):
1. Normal: \(\epsilon_t^2 = \sigma_t^2 \eta_t^2\), where \(\eta_t \sim \mathcal{N}(0, 1)\);
2. Laplace: \(|\epsilon_t| = s_t |\eta_t|\), where \(\eta_t \sim \mathcal{Laplace}(0, 1)\);
3. S: \(0.5 |\epsilon_t|^{0.5} = s_t |\eta_t|^{0.5}\), where \(\eta_t \sim \mathcal{S}(0, 1)\);
4. Generalised Normal: \(\beta |\epsilon_t|^{\beta} = s_t |\eta_t|^{\beta}\), where \(\eta_t \sim \mathcal{GN}(0, 1, \beta)\);
5. Log Normal: \(\log\left(1+\epsilon_t \right) = \sigma_t \eta_t -\frac{\sigma_t^2}{2}\), where \(\eta_t \sim \mathcal{N}(0, 1)\);
6. Inverse Gaussian: \(\frac{\epsilon_t ^2}{\left(1+\epsilon_t \right)}=\sigma^2_t \eta_t^2\), where \(\eta_t^2 \sim \chi^2(1)\);
7. Gamma: \(\left(1+\epsilon_t \right) = \sigma_t^2 \eta_t\), so that \(\eta_t \sim \mathcal{\Gamma}(\sigma_t^{-2}, 1)\).

Remark.

The relations between \(\epsilon_t\) and \(\eta_t\) in \(\mathcal{S}\) and \(\mathcal{GN}\) introduce constants \(0.5\) and \(\beta\), arising because of how the scales in those distributions are estimated (see Section 11.1);
The relation between \(\epsilon_t\) and \(\eta_t\) in Log-Normal distribution is complicated because for the latter to be Standard Normal, the former needs to be transformed according to the formulae above;
In case of Inverse Gaussian, transformations shown above are required to make the \(\eta_t\) independent of the scale parameter;
Finally, in case of Gamma distribution, \(\eta_t\) cannot be made independent of the scale parameter, which makes it restrictive and less useful than other distributions.

The equations above can be used instead of the first equation in (17.8) to create and estimate the scale model for the chosen distribution: the other four equations in (17.8) will be exactly the same, substituting \(\eta_t^2\) with the respective \(\eta_t\), \(|\eta_t|^{0.5}\) and \(\eta_t^{\beta}\) – depending on the distribution. The estimation of the respective models can be done via the maximisation of the respective likelihood functions (as discussed in Section 11.1).

The diagnostics of the scale model can be done in the same way as discussed in Chapter 14, keeping in mind the distributional assumptions about the \(\eta_t\) variable rather than \(\epsilon_t\). The smooth package already implements all the necessary diagnostic plots, similar to the ones discussed for the location model in Chapter 14.

Finally, the model selection for the scale part can be done using the same principles as discussed in Chapter 15. For example, one can select the most suitable ETS model similar to how it was discussed in Section 15.1, or the most suitable ARIMA, as in Section 15.2, or a set of explanatory variables, based on the approach in Section 15.3. All of this is available out of the box if the scale model is estimated via likelihood maximisation.