This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

## 13.1 Occurrence part of the model

The general model (13.1) assumes that demand occurs randomly and that the variable $$o_t$$ can be either zero (no demand) or one (there is some demand). While this process can be modelled using different distributions, we propose using Bernoulli with a time varying probability (in the most general case): $$$o_t \sim \text{Bernoulli} \left(p_t \right) , \tag{13.2}$$$ where $$p_t$$ is the probability of occurrence. In this section we will discuss different types of models for this probability. Each one has some idea behind it, and there are different mechanisms of the model construction, estimation, error calculation, update of the states and the generation of forecasts for each of them. The Probability Mass Function (PMF) of Bernoulli distribution based on this formulation is: $$$f_o(o_t, p_t) = p_t^{o_t}(1-p_t)^{1-o_t}. \tag{13.3}$$$ The occurrence part of the model can be estimated via the maximisation of the log-likelihood function, which comes directly from (13.3), and in the most general case is: $$$\ell \left(\boldsymbol{\theta}_o | o_t \right) = \sum_{o_t=1} \log(\hat{p}_t) + \sum_{o_t=0} \log(1-\hat{p}_t) , \tag{13.4}$$$ where $$\hat{p}_t$$ is the in-sample conditional one step ahead expectation of the probability on observation $$t$$, given the information on observation $$t-1$$, which depends on the vector of estimated parameters for the occurrence part of the model $$\boldsymbol{\theta}_o$$.

In order to demonstrate the difference between specific types of oETS models, we will use the following artificial data:

y <- ts(c(rpois(20,0.25),rpois(20,0.5),rpois(20,1),rpois(20,2),rpois(20,3),rpois(20,5)))

Here how the data looks like:

plot(y)

The probability of occurrence in this model increases together with the demand sizes. This example corresponds to the situation of intermittent demand of a product becoming regular.

### 13.1.1 Fixed probability model, oETS$$_F$$

We start with the simplest case of the fixed probability of occurrence, oETS$$_F$$ model: $$$o_t \sim \text{Bernoulli}(p) , \tag{13.5}$$$ This model assumes that demand happens with the same probability no matter what. This might sound exotic, because in practice, there might be many factors influencing customers’ desire to purchase and the impact of these factors might change over the year. But this is a very basic model, which at least can be used as a benchmark on intermittent demand data. Furthermore, it might be suitable for modelling demand on expensive high-tech products, such as jet engines, which is very slow’’ in its nature and typically does not evolve much over time.

When estimated via maximisation of likelihood function (13.4), the probability of occurrence is equal to: $$$\hat{p} = \frac{T_1}{T}, \tag{13.6}$$$ where $$T_1$$ is the number of non-zero observations and $$T$$ is the number of all the available observations in sample.

The occurrence part of the model, oETS$$_F$$ can be constructed using oes() function from smooth package:

oETSFModel1 <- oes(y, occurrence="fixed", h=10, holdout=TRUE)
oETSFModel1
## Occurrence state space model estimated: Fixed probability
## Underlying ETS model: oETS[F](MNN)
## Vector of initials:
##  level
## 0.6182
##
## Error standard deviation: 1.0937
## Sample size: 110
## Number of estimated parameters: 1
## Number of degrees of freedom: 109
## Information criteria:
##      AIC     AICc      BIC     BICc
## 148.2884 148.3254 150.9889 151.0759
plot(oETSFModel1)

The plot above demonstrates the dynamics of the occurrence variable $$o_t$$ and the fitted and predicted probabilities. The oETS$$_F$$ model produces the straight line for the probability of 0.62, ignoring the fact that in our example the probability of occurrence has increased over time.

### 13.1.2 Odds ratio model, oETS$$_O$$

In this model, it is assumed that the update of the probability is driven by the occurrence of variable. It is more complicated than the previous as the probability now changes over time and can be modelled, for example, with ETS(M,N,N) model: \begin{aligned} & o_t \sim \text{Bernoulli} \left(p_t \right) \\ & p_t = \frac{\mu_{a,t}}{\mu_{a,t}+1} \\ & a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ & l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ & \mu_{a,t} = l_{a,t-1} \end{aligned}, \tag{13.7} where $$l_{a,t}$$ is the unobserved level component, $$\alpha_{a}$$ is the smoothing parameter, the error term $$1+\epsilon_{a,t}$$ is positive, has means of one and follows an unknown distribution and $$\mu_{a,t}$$is the conditional expectation for the unobservable shape parameters $$a_t$$. The measurement and transition equations in (13.7) can be substituted by any other ADAM ETS or ARIMA model if it is reasonable to assume that the dynamics of probability has some additional components. This model is called odds ratio’’, because the probability of occurrence in (13.7) is calculated using the classical logistic transform. This also means that $$\mu_{a,t}$$ is equal to: $$$\label{eq:oETS_O_oddsRatio} \mu_{a,t} = \frac{p_t}{1 - p_t} .$$$ When $$\mu_{a,t}$$ increases in the oETS model, the odds ratio increases as well, meaning that the probability of occurrence goes up. explain that this model is in theory more appropriate for the demand on products becoming obsolescent, but given the updating mechanism it should also work fine on other types of intermittent demand data.

When it comes to the application of the model to the data, its construction is done via the following set of equations (example with oETS$$_O$$(M,N,N)): \begin{aligned} & \hat{p}_t = \frac{\hat{\mu}_{a,t}}{\hat{\mu}_{a,t}+1} \\ & \hat{\mu}_{a,t} = \hat{l}_{a,t-1} \\ & \hat{l}_{a,t} = \hat{l}_{a,t-1}( 1 + \hat{\alpha}_{a} e_{a,t}) \\ & 1+e_{a,t} = \frac{u_t}{1-u_t} \\ & u_{t} = \frac{1 + o_t - \hat{p}_t}{2} \end{aligned}, \tag{13.8} where $$e_{a,t}$$ is the proxy for the unobservable error term $$\epsilon_{a,t}$$ and $$\hat{\mu}_t$$ is the estimate of $$\mu_{a,t}$$, conditional expectation discussed in Subsection 3.4. If a multiple steps ahead forecast for the probability is needed from this model, then the formulae discussed in Subsection 3.4 can be used to get $$\hat{\mu}_{a,t}$$, which then can be inserted in the first equation of (13.8) to get the final conditional multiple steps ahead probability of occurrence.

Finally, in order to estimate the model (13.8), the likelihood (13.4) can be used.

The occurrence part of the model oETS$$_O$$ is constructed using the very same oes() function, but also allows specifying the ETS model to use. For example, here is the oETS$$_O$$(M,M,N) model:

oETSOModel <- oes(y, model="MMN", occurrence="odds-ratio", h=10, holdout=TRUE)
oETSOModel
## Occurrence state space model estimated: Odds ratio
## Underlying ETS model: oETS[O](MMN)
## Smoothing parameters:
##  level  trend
## 0.0275 0.0020
## Vector of initials:
##  level  trend
## 0.3717 0.9387
##
## Error standard deviation: 1.9162
## Sample size: 110
## Number of estimated parameters: 4
## Number of degrees of freedom: 106
## Information criteria:
##      AIC     AICc      BIC     BICc
## 115.5034 115.8843 126.3053 127.2006
plot(oETSOModel)

The constructed model introduces the trend component, capturing the changing probability of occurrence and reflecting well the fact that it increases over time.

### 13.1.3 Inverse odds ratio model, oETS$$_I$$

Using similar approach to the oETS$$_O$$ model, we can formulate the “inverse odds ration” model oETS$$_I$$(M,N,N): \begin{aligned} & o_t \sim \text{Bernoulli} \left(p_t \right) \\ & p_t = \frac{1}{1+\mu_{b,t}} \\ & b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ & l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ & \mu_{b,t} = l_{b,t-1} \end{aligned}, \tag{13.9} where similarly to (13.8), $$l_{b,t}$$ is the unobserved level component, $$\alpha_{b}$$ is the smoothing parameter, the error term $$1+\epsilon_{b,t}$$ is positive, has means of one and follows an unknown distribution and $$\mu_{b,t}$$is the conditional expectation for the unobservable shape parameters $$b_t$$. The main difference of this model with the previous is in the different mechanism of probability calculation, which focuses on the probability of “inoccurrence,” i.e. on zeroes of data rather than on ones. This type of model should be more appropriate for cases of demand building up . The probability calculation mechanism in (13.9) implies that $$\mu_{b,t}$$ is equal to: $$$\mu_{b,t} = \frac{1-p_t}{p_t} .$$$

The construction of the model (13.9) is similar to (13.8): \begin{aligned} & \hat{p}_t = \frac{1}{1+\hat{\mu}_{b,t}} \\ & \hat{\mu}_{b,t} = \hat{l}_{b,t-1} \\ & \hat{l}_{b,t} = l_{b,t-1}( 1 + \hat{\alpha}_{b} e_{b,t}) \\ & 1+e_{b,t} = \frac{1-u_t}{u_t} \\ & u_{t} = \frac{1 + o_t - \hat{p}_t}{2} \end{aligned}, \tag{13.10} where $$e_{b,t}$$ is the proxy for the unobservable error term $$\epsilon_{b,t}$$ and $$\hat{\mu}_{b,t}$$ is the estimate of $$\mu_{b,t}$$. Once again, we refer an interested reader to Subsection 3.4 for the discussion of the multiple steps ahead conditional expectations from the model.

show that the oETS$$_I$$(M,N,N) model can also be estimated using Croston’s method, as long as we can assume that the probability does not change over time substantially. In this case the demand intervals can be used instead of $$\hat{\mu}_{b,t}$$ in (13.10). So the iETS(M,N,N)$$_I$$(M,N,N) can be considered as the model underlying Croston’s method.

The function oes() implements the oETS$$_I$$ model as well. For example, here is the oETS$$_I$$(M,M,N) model:

oETSIModel <- oes(y, model="MMN", occurrence="inverse-odds-ratio", h=10, holdout=TRUE)
oETSIModel
## Occurrence state space model estimated: Inverse odds ratio
## Underlying ETS model: oETS[I](MMN)
## Smoothing parameters:
##  level  trend
## 0.0164 0.0000
## Vector of initials:
##  level  trend
## 8.1656 0.9430
##
## Error standard deviation: 3.8557
## Sample size: 110
## Number of estimated parameters: 4
## Number of degrees of freedom: 106
## Information criteria:
##      AIC     AICc      BIC     BICc
## 116.1990 116.5800 127.0009 127.8963
plot(oETSIModel)

Similarly to the oETS$$_O$$, the model captures the trend in the probability of occurrence, but will have different smoothing parameters.

### 13.1.4 General oETS model, oETS$$_G$$

Uniting the models oETS$$_O$$ with oETS$$_I$$, we can obtain the “general” model, which in the most general case can be summarised in the following way: \begin{aligned} & p_t = f_p(\mu_{a,t}, \mu_{b,t}) \\ & a_t = w_a(\mathbf{v}_{a,t-\boldsymbol{l}}) + r_a(\mathbf{v}_{a,t-\boldsymbol{l}}) \epsilon_{a,t} \\ & \mathbf{v}_{a,t} = f_a(\mathbf{v}_{a,t-\boldsymbol{l}}) + g_a(\mathbf{v}_{a,t-\boldsymbol{l}}) \epsilon_{a,t} \\ & b_t = w_b(\mathbf{v}_{b,t-\boldsymbol{l}}) + r_b(\mathbf{v}_{b,t-\boldsymbol{l}}) \epsilon_{b,t} \\ & \mathbf{v}_{b,t} = f_b(\mathbf{v}_{b,t-\boldsymbol{l}}) + g_b(\mathbf{v}_{b,t-\boldsymbol{l}}) \epsilon_{b,t} \end{aligned} , \tag{13.11} where $$\epsilon_{a,t}$$, $$\epsilon_{b,t}$$, $$\mu_{a,t}$$ and $$\mu_{b,t}$$ have been defined in previous subsectionsa and the other elements correspond to the ADAM model discussed in section 5. Note that two models similar to the one discussed in Section 5 are used for modelling of $$a_t$$ and $$b_t$$. The general formula for the probability in case of the multiplicative error model is: $$$p_t = \frac{\mu_{a,t}}{\mu_{a,t}+\mu_{b,t}} , \tag{13.12}$$$ while for the additive one, it is: $$$p_t = \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t})+\exp(\mu_{b,t})} . \tag{13.13}$$$ This is because both $$\mu_{a,t}$$ and $$\mu_{b,t}$$ need to be strictly positive, while the additive error models support the real plane. The canonical oETS model assumes that the pure multiplicative model is used for the both $$a_t$$ and $$b_t$$. This type of model is positively defined for any values of error, trend and seasonality, which is essential for the values of $$a_t$$ and $$b_t$$ and their expectations. If a combination of additive and multiplicative error models is used, then the additive part should be exponentiated prior to the usage of the formulae for the calculation of the probability. So, $$f_p(\cdot)$$ function maps the expectations from models A and B to the probability of occurrence, depending on the error type of the respective models: p_t = f_p(\mu_{a,t}, \mu_{b,t}) = \left \lbrace \begin{aligned} & \frac{\mu_{a,t}}{\mu_{a,t} + \mu_{b,t}} & \text{ when both have multiplicative errors} \\ & \frac{\mu_{a,t}}{\mu_{a,t} + \exp(\mu_{b,t})} & \text{ when model B has additive error} \\ & \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t}) + \mu_{b,t}} & \text{ when model A has additive error} \\ & \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t}) + \exp(\mu_{b,t})} & \text{ when both have additive errors} \end{aligned} . \right. \tag{13.14}

An example of the oETS model is oETS$$_G$$(M,N,N)(M,N,N): \begin{aligned} & o_t \sim \text{Bernoulli} \left(p_t \right) \\ & p_t = \frac{\mu_{a,t}}{\mu_{a,t}+\mu_{b,t}} \\ \\ & a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ & l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ & \mu_{a,t} = l_{a,t-1} \\ \\ & b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ & l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ & \mu_{b,t} = l_{b,t-1} \\ \end{aligned}, \tag{13.15} where all the parameters have already been defined in previous subsections. More advanced models can be constructing for $$a_t$$ and $$b_t$$ by specifying the ETS models for each part and / or adding explanatory variables. The construction of this model is done via the following set of equations: \begin{aligned} & e_{a,t} = \frac{u_t}{1-u_t} -1 \\ & \hat{a}_t = \hat{l}_{a,t-1} \\ & \hat{l}_{a,t} = \hat{l}_{a,t-1}( 1 + \alpha_{a} e_{a,t}) \\ & e_{b,t} = \frac{1-u_t}{u_t} -1 \\ & \hat{b}_t = \hat{l}_{b,t-1} \\ & \hat{l}_{b,t} = \hat{l}_{b,t-1}( 1 + \alpha_{b} e_{b,t}) \end{aligned} . \tag{13.16}

There is a separate function for the oETS$$_G$$ model, called oesg(). It has twice more parameters than oes(), because it allows fine tuning of the models for the both variables $$a_t$$ and $$b_t$$. This gives an additional flexibility. For example, here is how we can use ETS(M,N,N) for the $$a_t$$ and ETS(A,A,N) for the $$b_t$$, resulting in oETS$$_G$$(M,N,N)(A,A,N):

oETSGModel1 <- oesg(y, modelA="MNN", modelB="AAN", h=10, holdout=TRUE)
oETSGModel1
## Occurrence state space model estimated: General
## Underlying ETS model: oETS[G](MNN)(AAN)
##
## Sample size: 110
## Number of estimated parameters: 6
## Number of degrees of freedom: 104
## Information criteria:
##      AIC     AICc      BIC     BICc
## 120.3221 121.1377 136.5250 138.4417
plot(oETSGModel1)

We can also analyse separately models for $$a_t$$ and $$b_t$$. Here is, for example, the model A:

oETSGModel1\$modelA
## Occurrence state space model estimated: Odds ratio
## Underlying ETS model: oETS[G](MNN)_A
## Smoothing parameters:
##  level
## 0.0167
## Vector of initials:
##  level
## 1.0845
##
## Error standard deviation: 2.177
## Sample size: 110
## Number of estimated parameters: 2
## Number of degrees of freedom: 108
## Information criteria:
##      AIC     AICc      BIC     BICc
## 112.3221 112.4343 117.7231 117.9867

The experiments that I have done so far show that oETS$$_G$$ very seldomly brings improvements in comparison with oETS$$_O$$ or oETS$$_I$$ in terms of forecasting accuracy. Besides, selecting models for each of the parts is a challenging task. So, this model is theoretically nice, being more general than the other oETS models, but is not very practical. Still it is useful because we can introduce different oETS model by restricting $$a_t$$ and $$b_t$$. For example, we can get:

1. oETS$$_F$$, when $$\mu_{a,t} = \text{const}$$, $$\mu_{b,t} = \text{const}$$ for all $$t$$;
2. oETS$$_O$$, when $$\mu_{b,t} = 1$$ for all $$t$$;
3. oETS$$_I$$, when $$\mu_{a,t} = 1$$ for all $$t$$;
4. oETS$$_D$$, when $$\mu_{a,t} + \mu_{b,t} = 1$$, $$\mu_{a,t} \leq 1$$ for all $$t$$ (discussed in the next subsection);
5. oETS$$_G$$, when there are no restrictions.

### 13.1.5 Direct probability model, oETS$$_D$$

The last model in the family of oETS is the “Direct probability.” It appears, when the following restriction is imposed on the oETS$$_G$$: $$$\mu_{a,t} + \mu_{b,t} = 1, \mu_{a,t} \in [0, 1] . \tag{13.17}$$$ This restriction is inspired by the mechanism for the probability update proposed by (TSB method). In their paper they use SES to model the time varying probability of occurrence. Based on this idea and the restriction (13.17) we can formulate oETS$$_D$$(M,N,N) model, which will underly the occurrence part of the TSB method: \begin{aligned} & o_t \sim \text{Bernoulli} \left(\mu_{a,t} \right) \\ & a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ & l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ & \mu_{a,t} = \min(l_{a,t-1}, 1) \end{aligned}. \tag{13.18} There is also an option with the additive error for the occurrence part (also underlying TSB), which has a different, more complicated form: \begin{aligned} & o_t \sim \text{Bernoulli} \left(\mu_{a,t} \right) \\ & a_t = l_{a,t-1} + \epsilon_{a,t} \\ & l_{a,t} = l_{a,t-1} + \alpha_{a} \epsilon_{a,t} \\ & \mu_{a,t} = \max \left( \min(l_{a,t-1}, 1), 0 \right) \end{aligned}. \tag{13.19}

The estimation of the oETS$$_D$$(M,N,M) model can be done using the following set of equations: \begin{aligned} & \hat{\mu}_{a,t} = \hat{l}_{a,t-1} \\ & \hat{l}_{a,t} = \hat{l}_{a,t-1}( 1 + \hat{\alpha}_{a} e_{a,t}) \end{aligned}, \tag{13.20} where $$$e_{a,t} = \frac{o_t (1 - 2 \kappa) + \kappa - \hat{\mu}_{a,t}}{\hat{\mu}_{a,t}}, \tag{13.21}$$$ and $$\kappa$$ is a very small number (for example, $$\kappa = 10^{-10}$$), needed only in order to make the model estimable. The estimate of the error term in case of the additive model is much simpler and does not need any specific tricks to work: $$$e_{a,t} = o_t - \hat{\mu}_{a,t} , \tag{13.22}$$$ which is directly related to TSB method. Note that equation (13.20) does not contain $$\min$$ function, because the estimated error (13.21) will always guarantee that the level will lie between 0 and 1 as long as the smoothing parameter lies in the [0, 1] region (which is the conventional assumption for both ETS(A,N,N) and ETS(M,N,N) models). This also applies for the oETS$$_D$$(A,N,N) model, where the $$\max$$ and $$\min$$ functions can be dropped as long as the smoothing parameter lies in [0,1].

An important feature of this model is that it allows probability to become either 0 or 1, thus implying either that there is no demand on the product at all or that the demand on product has become regular. No other oETS model permits that - they assume that probability might become very close to bounds, but can never reach them.

When it comes to initialising the oETS$$_D$$ model, the values are calculated directly from the data without any additional transformations.

Here’s an example of the application of the oETS$$_D$$(M,M,N) to the same artificial data:

oETSDModel <- oes(y, model="MMN", occurrence="d", h=10, holdout=TRUE)
oETSDModel
## Occurrence state space model estimated: Direct probability
## Underlying ETS model: oETS[D](MMN)
## Smoothing parameters:
##  level  trend
## 0.0642 0.0000
## Vector of initials:
##  level  trend
## 0.2919 1.0050
##
## Error standard deviation: 1.2258
## Sample size: 110
## Number of estimated parameters: 4
## Number of degrees of freedom: 106
## Information criteria:
##      AIC     AICc      BIC     BICc
## 123.5053 123.8862 134.3072 135.2025
plot(oETSDModel)

The empirical analysis I have done so far on different datasets shows that oETS$$_D$$ model works efficiently in many cases and produces accurate forecasts. So, if you are unsure, which of the oETS models to choose for your intermittent data, I would recommend starting with oETS$$_D$$.

### 13.1.6 Model selection in oETS

There are two dimensions for the model selection in the oETS model:

1. Selection of the type of occurrence;
2. Selection of the model type for the occurrence part.

It is not a rocket science, what can be done in order to select the most appropriate model for the occurrence part is the IC based selection. Given the likelihood (13.4), we can try each of the models, calculate the number of all estimated parameters and then select the one that has the lowest information criterion. The demand occurrence models discussed in this section will have:

1. oETS$$_F$$: 1 parameter for the probability of occurrence;
2. oETS$$_O$$, oETS$$_I$$ and oETS$$_D$$: initial values, smoothing and dampening parameters;
3. oETS$$_G$$: initial values, smoothing and dampening parameters for models A and B;

For example, if the oETS(M,N,N) model is constructed, the overall number of parameters for the models will be:

1. oETS(M,N,N)$$_F$$ - 1 parameter: the probability of occurrence $$\hat{p}$$ and the scale parameter for the demand sizes;
2. oETS(M,N,N)$$_O$$, oETS(M,N,N)$$_I$$ and oETS(M,N,N)$$_D$$ - 2: the initial value of level and the smoothing parameter;
3. oETS(M,N,N)$$_G$$ - 4: the initial values of $$\hat{l}_{a,0}$$ and $$\hat{l}_{b,0}$$ and the smoothing parameters $$\hat{\alpha}_a$$ and $$\hat{\alpha}_b$$.

This implies that the selection between models in (2) will come to the best fit to the demand occurrence data, while oETS(M,N,N)$$_G$$ will only be selected if it provides much better fit to the data. Given that intermittent demand typically does not have many observations, selection of oETS(M,N,N)$$_G$$ becomes highly improbable.

When it comes to the selection of the most appropriate demand occurrence model (e.g. between local level and local trend models), then the approach would be similar: estimate the pool of models via likelihood, calculate their number of parameters, select the model with the lowest AIC. Given that the likelihood part of the demand occurrence part comes to probabilities, the selection in the occurrence part can be done based on the likelihood (13.4) independently of the demand sizes part of the model.

### References

• Svetunkov, I., Boylan, J.E., 2019. Multiplicative state-space models for intermittent time series. https://doi.org/10.13140/RG.2.2.35897.06242
• Teunter, R.H., Syntetos, A.A., Babai, M.Z., 2011. Intermittent demand: Linking forecasting to inventory obsolescence. European Journal of Operational Research. 214, 606–615. https://doi.org/10.1016/j.ejor.2011.05.018