13.2 Demand sizes part of the model
So far, we have discussed the occurrence part of the model \(o_t\) and how to capture the probability of demand occurrence \(p_t\). But this is only half of the intermittent state space model. The second one is the model for the demand sizes \(z_t\), which focuses on how many units of product will be sold if our customers decide to buy in a specific period of time. This can be modelled with any ADAM (ETS/ARIMA/regression), but it would need to be amended slightly to take intermittent demand features into account.
We start the discussion with analysis of an iETS(M,N,N)\(_F\) model, which can be formulated as: \[\begin{equation} \begin{aligned} & y_t = o_t z_t \\ & z_t = l_{z,t-1}(1 + \epsilon_{z,t}) \\ & l_{z,t} = l_{z,t-1}(1 + \alpha_{z} \epsilon_{z,t}) \\ & o_t \sim \text{Bernoulli}(p) \\ \end{aligned}, \tag{13.23} \end{equation}\] where the subscript \(z\) refers to the components and parameters of demand sizes. This model assumes that there is always a potential demand on the product which evolves over time (even when \(o_t=0\)), but we do not always observe it. This model’s main properties have already been discussed in Section 6.1. The main challenge appears when this model needs to be constructed and estimated because \(z_t\) is not observable when \(o_t=0\). In these instances, the error term cannot be estimated, but according to the model, it still exists, thus impacting the level of demand \(l_{z,t}\). To construct the model in the cases of no demand, we propose taking the conditional expectation for these periods, given the last available non-zero observation. This means that iETS(M,N,N)\(_F\) can be constructed using the following set of equations: \[\begin{equation} \begin{aligned} & e_{z,t} = \frac{z_t -\hat{\mu}_{z,t}}{\hat{\mu}_{z,t}}, \text{ when } o_t=1 \\ & \hat{\mu}_{z,t} = \hat{l}_{z,t-1} \\ & \hat{l}_{z,t} = \left \lbrace \begin{aligned} & \hat{l}_{z,t-1} (1 + \hat{\alpha}_z e_t ), & \text{ when } o_t=1 \\ & \hat{l}_{z,t-1} , & \text{ when } o_t=0 \end{aligned} \right. \end{aligned}. \tag{13.24} \end{equation}\] This is only possible if \(\mathrm{E}(1+\epsilon_{z,t})=1\), which is an important assumption for multiplicative error models, discussed in Section 6.5. If this is violated, then the formula for the calculation of the level in (13.24) will become more complicated, involving the expectation of products of random variables.
In a similar way, we can construct more complicated models for the demand sizes. In a more general case (Section 5) this can be written as: \[\begin{equation} \begin{aligned} & e_{z,t} = \frac{z_t -\hat{\mu}_{z,t}}{\hat{\mu}_{z,t}}, \text{ when } o_t=1 \\ & \hat{\mathbf{v}}_{t} = \left \lbrace \begin{aligned} & f(\hat{\mathbf{v}}_{t-\boldsymbol{l}}) + g(\hat{\mathbf{v}}_{t-\boldsymbol{l}}) e_t, & \text{ when } o_t=1 \\ & f(\hat{\mathbf{v}}_{t-\boldsymbol{l}}) , & \text{ when } o_t=0 \end{aligned} \right. \end{aligned}, \tag{13.25} \end{equation}\] where all the functions and vectors have been defined for the original ADAM (7.1) in Section 7.1.
13.2.1 Additive vs multiplicative ETS for demand sizes
The approach above supports any type of ADAM, including pure additive ETS (Section 5.1), pure multiplicative ETS (Section 6.1) or mixed ETS (Section 7.2). While selection of the appropriate model can be automated, I argue that the better approach is to do it based on the understanding of the problem. In demand forecasting, typically we expect the values to be non-negative: people want to buy our product, and usually, the business does not want to buy products back from customers (unless we are dealing with a circular supply chain, but this is a different topic). This means that the pure multiplicative models should be preferred to the additive ones, as they will always produce meaningful results, as long as the assumption of positivity of \((1+\epsilon_{z,t})\) holds. This assumption is important because the intermittent demand would typically have low volume, and the model might generate unreasonable (negative) point and interval forecasts if a non-positive distribution is used for the error term (e.g. Normal). Thus, it is important to use Inverse Gaussian, or Gamma, or Log-Normal distribution (see discussion in Section 6.5) for the error term of the demand sizes part of the model when the volume of data is low, and you expect the non-zero values to be strictly positive.
The main difficulty with pure multiplicative models arises from the construction point of view. As discussed in Section 6.3, the point forecasts of such models, in general, do not correspond to the conditional \(h\) steps ahead expectations (the only exclusion is the ETS(M,N,N) model). At the same time, the construction of the model for demand sizes assumes that the conditional expectations are equal to point forecasts when demand is not observed. If this is violated, then (13.25) is no longer the correct way to construct the model. This problem becomes especially important for the models with the multiplicative trend, where the conditional expectation might differ from point forecasts substantially (Svetunkov and Boylan, 2023b). Still, point forecasts can be considered proxies for the conditional expectations, especially when smoothing parameters are close to zero. For example, the conditional expectation coincides with the point forecast in the boundary case with \(\alpha=0\) and \(\beta=0\) in ETS(M,M,N). The higher the smoothing parameters are, the more significant the discrepancy will be, implying that the model for the demand sizes is constructed incorrectly.
The pure additive models do not have the issue with the conditional expectation and thus can be constructed easily in the case of intermittent demand. But as discussed earlier, they might violate the non-negativity assumption of the model. So, in practice, they should be used with care.
13.2.2 Using ARIMA for demand sizes
ADAM ARIMA can also be used for demand sizes, resulting in the iARIMA model. All the discussions in the previous subsection would apply to ARIMA as well, keeping in mind that ADAM ARIMA can be either pure additive (Section 9.1.2) or pure multiplicative (Section 9.1.4). Given that the multiplicative ARIMA is formulated via logarithms and still has the error term with the expectation of one, any ARIMA model can be used for the variable \(z_t\) and can be constructed via (13.25). This can also be used for the cases when a pure multiplicative model with the trend is needed, and there are difficulties with the construction of ETS(M,M,N) (i.e. smoothing parameters are not close to zero). The relation between ARIMA and ETS (discussed in Section 8.4) might be useful in this case: instead of constructing ETS(M,M,N) we can construct logARIMA(0,2,2) (see Section 9.1.4), sidestepping the aforementioned problem.
13.2.3 Rounding up forecasts
Finally, when it comes to using an intermittent state space model on count data, there is a temptation to round up the resulting forecasts. If this is done for the point forecasts (conditional expectations), then this should be avoided, because the values show what happens on average and thus are allowed to take any values, not only the integers. However, when it comes to predictive quantiles, Svetunkov and Boylan (2023a) show that rounding them up is equivalent to generating quantiles from a model with discretised distribution (see discussion on discretised distributions in Chakraborty, 2015) and improves both the forecasting and inventory performance of the model. So, the simple approach of generating a prediction interval (see Section 18.3) and then rounding it up has a theoretical rationale behind it and works well in practice.