17.1 Model formulation

We start our discussion with an example of the ETS model and then move to the more general one.

17.1.1 An example with local level model with Normal distribution

Consider ETS(A,N,N) model (which was discussed in Section 4.3), which has the following measurement equation: \[\begin{equation} y_t = l_{t-1} + \epsilon_t, \tag{17.1} \end{equation}\] where the most commonly used assumption for the error term is: \[\begin{equation*} \epsilon_t \sim \mathcal{N}(0, \sigma^2) . \end{equation*}\] The same error term can be represented as a multiplication of the standard normal variable by the standard deviation: \[\begin{equation} \epsilon_t = \sigma \eta_t, \tag{17.2} \end{equation}\] where \(\eta_t \sim \mathcal{N}(0, 1)\). Now consider the situation when instead of the constant variance \(\sigma^2\) we have one that changes over time either because of its own dynamics or because of the influence of explanatory variables. In that case we will add the subscript \(t\) to the variance in (17.2) to have: \[\begin{equation} \epsilon_t = \sigma_t \eta_t. \tag{17.3} \end{equation}\] The thing to keep in mind is that in the case of Normal distribution, the scale is equal to variance rather than standard deviation, so we need to consider \(\sigma_t^2\) rather than \(\sigma_t\). The variance, in this case, can be modelled explicitly using any ADAM. However, the pure multiplicative ones make more sense because they guarantee that the variance will not become zero or even negative. For simplicity, we use ETS(M,N,N) for the variance: \[\begin{equation} \begin{aligned} &\epsilon_t^2 = \sigma_t^2 \eta_t^2 \\ &\sigma_t^2 = l_{\sigma,t-1} \\ &l_{\sigma,t} = l_{\sigma,t-1} \left(1 + \alpha_\sigma (\eta_t^2-1)\right) \end{aligned}, \tag{17.4} \end{equation}\] Note that although the part \(\left(1 + \alpha_\sigma (\eta_t^2-1)\right)\) looks slightly different than the respective part \(\left(1 + \alpha \epsilon_t \right)\) in the conventional ETS(M,N,N), they are equivalent: if we substitute \(\xi_t = \eta_t^2-1\) in (17.4), we will arrive to the conventional ETS(M,N,N) model. Another thing to notice in this formulation is that because \(\eta_t\) follows standard Normal distribution, its square will follow Chi-squared distribution: \(\eta_t^2 \sim \chi^2(1)\). This can be used for model diagnostics. Finally, we can use all the properties of pure multiplicative models discussed in Chapter 6 to get the fitted values and forecasts from the model (17.4): \[\begin{equation} \begin{aligned} &\sigma_{t|t-1}^2 = l_{\sigma,t-1} \\ &\sigma_{t+h|t}^2 = l_{\sigma,t} \end{aligned}. \tag{17.5} \end{equation}\] In order to construct this model, we need to collect the residuals \(e_t\) of the location model (17.1), square them and use in the following system of equations: \[\begin{equation} \begin{aligned} &\hat{\sigma}^2_{t} = \hat{l}_{\sigma,t-1} \\ &\hat{\eta}_t^2 = \frac{e_t^2}{\hat{\sigma}^2_{t}} \\ &\hat{l}_{\sigma,t} = \hat{l}_{\sigma,t-1} (1 + \hat{\alpha}_\sigma (\hat{\eta}_t^2-1)) \end{aligned}. \tag{17.6} \end{equation}\] In order for this to work, we need to estimate \(\hat{l}_{\sigma,0}\) and \(\hat{\alpha}_\sigma\), which can be done in the conventional way by maximising the log-likelihood function of the normal distribution (see Section 11.1) – the only thing that will change in comparison with the conventional estimation is the fitted values of \(\hat{\sigma}^2_{t}\) for the variance generated from (17.6): \[\begin{equation} \ell(\boldsymbol{\theta}, {\sigma}_t^2 | \mathbf{y}) = -\frac{T}{2} \log(2 \pi \sigma_t^2) -\frac{1}{2} \sum_{t=1}^T \frac{\epsilon_t^2}{\sigma_t^2} . \tag{17.7} \end{equation}\] The thing to keep in mind is that the final ETS(A,N,N) model (17.1) with the scale model (17.4) will have four parameters instead of 3 as in the case of a simpler model: 2 for the location part of the model (initial level \(l_{0}\) and smoothing parameter \(\alpha\)) and 2 for the scale part of the model (initial level \(l_{\sigma,0}\) and smoothing parameter \(\alpha_\sigma\)).

As can be seen from this example, constructing and estimating the scale model for ADAM is not a difficult task. Following similar principles, we can apply any other pure multiplicative model for scale, including ETS(Y,Y,Y) (Chapter 6), log ARIMA (Section 9.1.4) or a multiplicative regression. Furthermore, the location model ETS(A,N,N) can be substituted by any other ETS/ARIMA/Regression model – the same principles as discussed in this Subsection can be applied to them. The only restriction in all of this is that both parts of the model should be estimated via maximisation of likelihood – other methods typically do not explicitly estimate the scale of distribution.

17.1.2 General case with Normal distribution

More generally speaking, scale model can be created for any ETS/ARIMA/Regression location model. Following from the discussion in Section 7.1, the general location model can be written as: \[\begin{equation*} \begin{aligned} {y}_{t} = & w(\mathbf{v}_{t-\mathbf{l}}) + r(\mathbf{v}_{t-\mathbf{l}}) \epsilon_t \\ \mathbf{v}_{t} = & f(\mathbf{v}_{t-\mathbf{l}}) + g(\mathbf{v}_{t-\mathbf{l}}) \epsilon_t \end{aligned} \end{equation*}\] If we assume that \(\epsilon_t \sim \mathcal{N}(0,\sigma^2_t)\), then the general scale model with ETS+ARIMA+Reg elements can be formulated as (Section 6.1, Subsection 9.1.4 and Section 10.3): \[\begin{equation} \begin{aligned} & \epsilon_t^2 = \sigma_t^2 \eta_{t}^2 \\ & \sigma_t^2 = \exp \left(\mathbf{w}_E^\prime \log(\mathbf{v}_{E,\sigma,t-\mathbf{l}_E}) + \mathbf{w}_A^\prime \log(\mathbf{v}_{A,\sigma,t-\mathbf{l}_A}) + \mathbf{a}_t \mathbf{x}_t \right)\\ & \log \mathbf{v}_{E,\sigma,t} = \mathbf{F}_{E,\sigma} \log \mathbf{v}_{E,\sigma,t-\mathbf{l}_E} + \log(\mathbf{1}_k + \mathbf{g}_{E\sigma} (\eta_t^2-1))\\ & \log \mathbf{v}_{A,\sigma,t} = \mathbf{F}_{A,\sigma} \log \mathbf{v}_{A,\sigma,t-\mathbf{l}_A} + \mathbf{g}_{A,\sigma} \log(\eta_t^2) \\ & \mathbf{a}_{t} = \mathbf{a}_{t-1} + \mathbf{z}_t \mathbf{g}_{R,\sigma} \log(\eta_t^2) \end{aligned}, \tag{17.8} \end{equation}\] The main thing that unites all the parts of the model is that they should be pure multiplicative in order to avoid potential issues with negative numbers. The construction and estimation of the scale model in this case becomes similar to the one discussed for the ETS(A,N,N) example above. When it comes to forecasting of the conditional h-steps-ahead scale, given the limitations of the pure multiplicative model discussed in Section 6.3, it needs to be obtained via simulations – this way the forecast from the ADAM will coincide with the expectation, which in case of (17.8) will give the conditional h-steps-ahead scale. All the principles discussed in Sections 6.1, 9.1.4 and 10.3 can be used directly for the scale model without any limitations.

Finally, ADAM can be expanded even further by introducing the occurrence part of the model (i.e. dealing with the time-varying scale of distribution in case of intermittent demand). This part will need to be introduced in the location model. The scale model (17.8) can be used as-is in this case, applying it to the non-zero observations.

17.1.3 Other distributions

The examples above focused on the Normal distribution, but ADAM supports other distributions as well. Depending on the error term, these are:

  1. Additive error term (Section 5.5):
  1. Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma_t^2)\);
  2. Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s_t)\);
  3. S: \(\epsilon_t \sim \mathcal{S}(0, s_t)\);
  4. Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s_t, \beta)\);
  5. Log Normal: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \text{log}\mathcal{N}\left(-\frac{\sigma_t^2}{2}, \sigma_t^2\right)\);
  6. Inverse Gaussian: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{IG}(1, \sigma_t^2)\);
  7. Gamma: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{\Gamma}(\sigma_t^{-2}, \sigma_t^2)\);
  1. Multiplicative error term (Sections 6.5):
  1. Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma_t^2)\);
  2. Laplace: \(\epsilon_t \sim \mathcal{Laplace}(0, s_t)\);
  3. S: \(\epsilon_t \sim \mathcal{S}(0, s_t)\);
  4. Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s_t, \beta)\);
  5. Log Normal: \(\left(1+\epsilon_t \right) \sim \mathrm{log}\mathcal{N}\left(-\frac{\sigma_t^2}{2}, \sigma_t^2\right)\);
  6. Inverse Gaussian: \(\left(1+\epsilon_t \right) \sim \mathcal{IG}(1, \sigma_t^2)\);
  7. Gamma: \(\left(1+\epsilon_t \right) \sim \Gamma (\sigma^{-2}, \sigma_t^2)\).

The error terms in these cases can also be presented in a form similar to (17.3) to get the first equation in (17.8) for the respective distributions:

  1. Additive error term:
  1. Normal: \(\epsilon_t^2 = \sigma_t^2 \eta_t^2\), where \(\eta_t \sim \mathcal{N}(0, 1)\) or accidentally \(\eta_t^2 \sim \chi^2(1)\);
  2. Laplace: \(|\epsilon_t| = s_t |\eta_t|\), where \(\eta_t \sim \mathcal{Laplace}(0, 1)\);
  3. S: \(0.5 |\epsilon_t|^{0.5} = s_t |\eta_t|^{0.5}\), where \(\eta_t \sim \mathcal{S}(0, 1)\);
  4. Generalised Normal: \(\beta |\epsilon_t|^{\beta} = s_t |\eta_t|^{\beta}\), where \(\eta_t \sim \mathcal{GN}(0, 1, \beta)\);
  5. Log Normal: \(\log\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) = \sigma_t \eta_t-\frac{\sigma_t^2}{2}\), where \(\eta_t \sim \mathcal{N}(0, 1)\);
  6. Inverse Gaussian: \(\frac{\left(\frac{\epsilon_t}{\mu_{y,t}} \right)^2}{\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right)}=\sigma^2_t \eta_t^2\), where \(\eta_t^2 \sim \chi^2(1)\);
  7. Gamma: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) = \sigma_t^2 \eta_t\), so that \(\eta_t \sim \mathcal{\Gamma}(\sigma_t^{-2}, 1)\);
  1. Multiplicative error term (Sections 6.5):
  1. Normal: \(\epsilon_t^2 = \sigma_t^2 \eta_t^2\), where \(\eta_t \sim \mathcal{N}(0, 1)\);
  2. Laplace: \(|\epsilon_t| = s_t |\eta_t|\), where \(\eta_t \sim \mathcal{Laplace}(0, 1)\);
  3. S: \(0.5 |\epsilon_t|^{0.5} = s_t |\eta_t|^{0.5}\), where \(\eta_t \sim \mathcal{S}(0, 1)\);
  4. Generalised Normal: \(\beta |\epsilon_t|^{\beta} = s_t |\eta_t|^{\beta}\), where \(\eta_t \sim \mathcal{GN}(0, 1, \beta)\);
  5. Log Normal: \(\log\left(1+\epsilon_t \right) = \sigma_t \eta_t -\frac{\sigma_t^2}{2}\), where \(\eta_t \sim \mathcal{N}(0, 1)\);
  6. Inverse Gaussian: \(\frac{\epsilon_t ^2}{\left(1+\epsilon_t \right)}=\sigma^2_t \eta_t^2\), where \(\eta_t^2 \sim \chi^2(1)\);
  7. Gamma: \(\left(1+\epsilon_t \right) = \sigma_t^2 \eta_t\), so that \(\eta_t \sim \mathcal{\Gamma}(\sigma_t^{-2}, 1)\).

Remark. The relations between \(\epsilon_t\) and \(\eta_t\) in \(\mathcal{S}\) and \(\mathcal{GN}\) introduce constants \(0.5\) and \(\beta\), arising because of how the scales in those distributions are estimated (see Section 11.1). The relation between the error term \(\epsilon_t\) and the \(\eta_t\) in log-normal distribution is complicated because for the latter to be standard normal, the former needs to be transformed according to the formulae above. In the case of Inverse Gaussian, transformations are required to make the \(\eta_t\) independent of the scale parameter. Finally, in the case of Gamma distribution, \(\eta_t\) cannot be made independent of the scale parameter, which makes it restrictive and less useful than other distributions.

The equations above can be used instead of the first equation in (17.8) to create and estimate Scale Model for the chosen distribution: the other four equations in (17.8) will be exactly the same, substituting \(\eta_t^2\) with the respective \(\eta_t\), \(|\eta_t|^{0.5}\) and \(\eta_t^{\beta}\) – depending on the distribution. The estimation of the respective models can be done via the maximisation of the respective likelihood functions (as discussed in Section 11.1).

The diagnostics of scale model can be done in the same way as discussed in Chapter 14, keeping in mind the distributional assumptions about the \(\eta_t\) variable rather than \(\epsilon_t\).

Finally, the model selection for the scale part can be done using the same principles as discussed in Chapter 15. For example, one can select the most suitable ETS model similar to how it was discussed in Section 15.1, or the most suitable ARIMA, as in Section 15.2, or a set of explanatory variables, based on the approach in Section 15.3. All of this is available out of the box if the scale model is estimated via likelihood maximisation.