**Open Review**. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

## 14.1 Occurrence part of the model

The general model (14.1) assumes that demand occurs randomly and that the variable \(o_t\) can be either zero (no demand) or one (there is some demand). While this process can be modelled using different distributions, we propose using Bernoulli with a time varying probability (in the most general case): \[\begin{equation} o_t \sim \text{Bernoulli} \left(p_t \right) , \tag{14.2} \end{equation}\] where \(p_t\) is the probability of occurrence. In this section we will discuss different types of models for this probability. Each one has some idea behind it, and there are different mechanisms of the model construction, estimation, error calculation, update of the states and the generation of forecasts for each of them.

The occurrence part of the model can be estimated via the likelihood, which in the most general case is: \[\begin{equation} \ell \left(\boldsymbol{\theta} | o_t \right) = {\sum_{o_t=1}} \log(\hat{p}_t) + {\sum_{o_t=0}} \log(1-\hat{p}_t) , \tag{14.3} \end{equation}\] where \(\hat{p}_t\) is the in-sample conditional one step ahead expectation of the probability on observation \(t\), given the information on observation \(t-1\), which depends on the vector of estimated parameters \(\boldsymbol{\theta}\). The derivation of the formula (14.3) is discussed in more detail later in this chapter.

In order to demonstrate the difference between specific types of oETS models, we will use the following artificial data:

Here how the data looks like:

The probability of occurrence in this model increases together with the demand sizes. This example corresponds to the situation of intermittent demand of a product becoming regular.

### 14.1.1 Fixed probability model, oETS\(_F\)

We start with the simplest case of the fixed probability of occurrence, oETS\(_F\) model: \[\begin{equation} o_t \sim \text{Bernoulli}(p) , \tag{14.4} \end{equation}\] This model assumes that demand happens with the same probability no matter what. This might sound exotic, because in practice, there might be many factors influencing customers’ desire to purchase and the impact of these factors might change over the year. But this is a very basic model, which at least can be used as a benchmark on intermittent demand data. Furthermore, it might be suitable for modelling demand on expensive high-tech products, such as jet engines, which is ``very slow’’ in its nature and typically does not evolve much over time.

When estimated via maximisation of likelihood function (14.3), the probability of occurrence is equal to: \[\begin{equation} \hat{p} = \frac{T_1}{T}, \tag{14.5} \end{equation}\] where \(T_1\) is the number of non-zero observations and \(T\) is the number of all the available observations in sample.

The occurrence part of the model, oETS\(_F\) can be constructed using `oes()`

function from `smooth`

package:

```
## Occurrence state space model estimated: Fixed probability
## Underlying ETS model: oETS[F](MNN)
## Vector of initials:
## level
## 0.6818
##
## Error standard deviation: 1.0855
## Sample size: 110
## Number of estimated parameters: 1
## Number of degrees of freedom: 109
## Information criteria:
## AIC AICc BIC BICc
## 139.6081 139.6451 142.3086 142.3956
```

The plot above demonstrates the dynamics of the occurrence variable \(o_t\) and the fitted and predicted probabilities. The oETS\(_F\) model produces the straight line for the probability of 0.68, ignoring the fact that in our example the probability of occurrence has increased over time.

### 14.1.2 Odds ratio model, oETS\(_O\)

In this model, it is assumed that the update of the probability is driven by the occurrence of variable. It is more complicated than the previous as the probability now changes over time and can be modelled, for example, with ETS(M,N,N) model: \[\begin{equation} \begin{aligned} & o_t \sim \text{Bernoulli} \left(p_t \right) \\ & p_t = \frac{\mu_{a,t}}{\mu_{a,t}+1} \\ & a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ & l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ & \mu_{a,t} = l_{a,t-1} \end{aligned}, \tag{14.6} \end{equation}\] where \(l_{a,t}\) is the unobserved level component, \(\alpha_{a}\) is the smoothing parameter, the error term \(1+\epsilon_{a,t}\) is positive, has means of one and follows an unknown distribution and \(\mu_{a,t}\)is the conditional expectation for the unobservable shape parameters \(a_t\). The measurement and transition equations in (14.6) can be substituted by any other ADAM ETS or ARIMA model if it is reasonable to assume that the dynamics of probability has some additional components. This model is called ``odds ratio’’, because the probability of occurrence in (14.6) is calculated using the classical logistic transform. This also means that \(\mu_{a,t}\) is equal to: \[\begin{equation} \label{eq:oETS_O_oddsRatio} \mu_{a,t} = \frac{p_t}{1 - p_t} . \end{equation}\] When \(\mu_{a,t}\) increases in the oETS model, the odds ratio increases as well, meaning that the probability of occurrence goes up. I. Svetunkov and Boylan (2019a) explain that this model is in theory more appropriate for the demand on products becoming obsolescent, but given the updating mechanism it should also work fine on other types of intermittent demand data.

When it comes to the application of the model to the data, its construction is done via the following set of equations (example with oETS\(_O\)(M,N,N)): \[\begin{equation} \begin{aligned} & \hat{p}_t = \frac{\hat{\mu}_{a,t}}{\hat{\mu}_{a,t}+1} \\ & \hat{\mu}_{a,t} = \hat{l}_{a,t-1} \\ & \hat{l}_{a,t} = \hat{l}_{a,t-1}( 1 + \hat{\alpha}_{a} e_{a,t}) \\ & 1+e_{a,t} = \frac{u_t}{1-u_t} \\ & u_{t} = \frac{1 + o_t - \hat{p}_t}{2} \end{aligned}, \tag{14.7} \end{equation}\] where \(e_{a,t}\) is the proxy for the unobservable error term \(\epsilon_{a,t}\) and \(\hat{\mu}_t\) is the estimate of \(\mu_{a,t}\), conditional expectation discussed in Subsection 4.4. If a multiple steps ahead forecast for the probability is needed from this model, then the formulae discussed in Subsection 4.4 can be used to get \(\hat{\mu}_{a,t}\), which then can be inserted in the first equation of (14.7) to get the final conditional multiple steps ahead probability of occurrence.

Finally, in order to estimate the model (14.7), the likelihood (14.3) can be used.

The occurrence part of the model oETS\(_O\) is constructed using the very same `oes()`

function, but also allows specifying the ETS model to use. For example, here is the oETS\(_O\)(M,M,N) model:

```
## Occurrence state space model estimated: Odds ratio
## Underlying ETS model: oETS[O](MMN)
## Smoothing parameters:
## level trend
## 0.0021 0.0021
## Vector of initials:
## level trend
## 0.6428 0.9727
##
## Error standard deviation: 1.392
## Sample size: 110
## Number of estimated parameters: 4
## Number of degrees of freedom: 106
## Information criteria:
## AIC AICc BIC BICc
## 105.5072 105.8882 116.3092 117.2045
```

The constructed model introduces the trend component, capturing the changing probability of occurrence and reflecting well the fact that it increases over time.

### 14.1.3 Inverse odds ratio model, oETS\(_I\)

Using similar approach to the oETS\(_O\) model, we can formulate the “inverse odds ration” model oETS\(_I\)(M,N,N): \[\begin{equation} \begin{aligned} & o_t \sim \text{Bernoulli} \left(p_t \right) \\ & p_t = \frac{1}{1+\mu_{b,t}} \\ & b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ & l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ & \mu_{b,t} = l_{b,t-1} \end{aligned}, \tag{14.8} \end{equation}\] where similarly to (14.7), \(l_{b,t}\) is the unobserved level component, \(\alpha_{b}\) is the smoothing parameter, the error term \(1+\epsilon_{b,t}\) is positive, has means of one and follows an unknown distribution and \(\mu_{b,t}\)is the conditional expectation for the unobservable shape parameters \(b_t\). The main difference of this model with the previous is in the different mechanism of probability calculation, which focuses on the probability of “inoccurrence”, i.e. on zeroes of data rather than on ones. This type of model should be more appropriate for cases of demand building up (I. Svetunkov and Boylan 2019a). The probability calculation mechanism in (14.8) implies that \(\mu_{b,t}\) is equal to: \[\begin{equation} \mu_{b,t} = \frac{1-p_t}{p_t} . \end{equation}\]

The construction of the model (14.8) is similar to (14.7): \[\begin{equation} \begin{aligned} & \hat{p}_t = \frac{1}{1+\hat{\mu}_{b,t}} \\ & \hat{\mu}_{b,t} = \hat{l}_{b,t-1} \\ & \hat{l}_{b,t} = l_{b,t-1}( 1 + \hat{\alpha}_{b} e_{b,t}) \\ & 1+e_{b,t} = \frac{1-u_t}{u_t} \\ & u_{t} = \frac{1 + o_t - \hat{p}_t}{2} \end{aligned}, \tag{14.9} \end{equation}\] where \(e_{b,t}\) is the proxy for the unobservable error term \(\epsilon_{b,t}\) and \(\hat{\mu}_{b,t}\) is the estimate of \(\mu_{b,t}\). Once again, we refer an interested reader to Subsection 4.4 for the discussion of the multiple steps ahead conditional expectations from the model.

I. Svetunkov and Boylan (2019a) show that the oETS\(_I\)(M,N,N) model can also be estimated using Croston’s method, as long as we can assume that the probability does not change over time substantially. In this case the demand intervals can be used instead of \(\hat{\mu}_{b,t}\) in (14.9). So the iETS(M,N,N)\(_I\)(M,N,N) can be considered as the model underlying Croston’s method.

The function `oes()`

implements the oETS\(_I\) model as well. For example, here is the oETS\(_I\)(M,M,N) model:

```
## Occurrence state space model estimated: Inverse odds ratio
## Underlying ETS model: oETS[I](MMN)
## Smoothing parameters:
## level trend
## 0.175 0.175
## Vector of initials:
## level trend
## 0.4242 0.3011
##
## Error standard deviation: 4.1834
## Sample size: 110
## Number of estimated parameters: 4
## Number of degrees of freedom: 106
## Information criteria:
## AIC AICc BIC BICc
## 259.1840 259.5650 269.9860 270.8813
```

Similarly to the oETS\(_O\), the model captures the trend in the probability of occurrence, but will have different smoothing parameters.

### 14.1.4 General oETS model, oETS\(_G\)

Uniting the models oETS\(_O\) with oETS\(_I\), we can obtain the “general” model, which in the most general case can be summarised in the following way: \[\begin{equation} \begin{aligned} & p_t = f_p(\mu_{a,t}, \mu_{b,t}) \\ & a_t = w_a(\mathbf{v}_{a,t-\mathbf{l}}) + r_a(\mathbf{v}_{a,t-\mathbf{l}}) \epsilon_{a,t} \\ & \mathbf{v}_{a,t} = f_a(\mathbf{v}_{a,t-\mathbf{l}}) + g_a(\mathbf{v}_{a,t-\mathbf{l}}) \epsilon_{a,t} \\ & b_t = w_b(\mathbf{v}_{b,t-\mathbf{l}}) + r_b(\mathbf{v}_{b,t-\mathbf{l}}) \epsilon_{b,t} \\ & \mathbf{v}_{b,t} = f_b(\mathbf{v}_{b,t-\mathbf{l}}) + g_b(\mathbf{v}_{b,t-\mathbf{l}}) \epsilon_{b,t} \end{aligned} , \tag{14.10} \end{equation}\] where \(\epsilon_{a,t}\), \(\epsilon_{b,t}\), \(\mu_{a,t}\) and \(\mu_{b,t}\) have been defined in previous subsectionsa and the other elements correspond to the ADAM model discussed in section 6. Note that two models similar to the one discussed in Section 6 are used for modelling of \(a_t\) and \(b_t\). The general formula for the probability in case of the multiplicative error model is: \[\begin{equation} p_t = \frac{\mu_{a,t}}{\mu_{a,t}+\mu_{b,t}} , \tag{14.11} \end{equation}\] while for the additive one, it is: \[\begin{equation} p_t = \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t})+\exp(\mu_{b,t})} . \tag{14.12} \end{equation}\] This is because both \(\mu_{a,t}\) and \(\mu_{b,t}\) need to be strictly positive, while the additive error models support the real plane. The canonical oETS model assumes that the pure multiplicative model is used for the both \(a_t\) and \(b_t\). This type of model is positively defined for any values of error, trend and seasonality, which is essential for the values of \(a_t\) and \(b_t\) and their expectations. If a combination of additive and multiplicative error models is used, then the additive part should be exponentiated prior to the usage of the formulae for the calculation of the probability. So, \(f_p(\cdot)\) function maps the expectations from models A and B to the probability of occurrence, depending on the error type of the respective models: \[\begin{equation} p_t = f_p(\mu_{a,t}, \mu_{b,t}) = \left \lbrace \begin{aligned} & \frac{\mu_{a,t}}{\mu_{a,t} + \mu_{b,t}} & \text{ when both have multiplicative errors} \\ & \frac{\mu_{a,t}}{\mu_{a,t} + \exp(\mu_{b,t})} & \text{ when model B has additive error} \\ & \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t}) + \mu_{b,t}} & \text{ when model A has additive error} \\ & \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t}) + \exp(\mu_{b,t})} & \text{ when both have additive errors} \end{aligned} . \right. \tag{14.13} \end{equation}\]

An example of the oETS model is oETS\(_G\)(M,N,N)(M,N,N): \[\begin{equation} \begin{aligned} & o_t \sim \text{Bernoulli} \left(p_t \right) \\ & p_t = \frac{\mu_{a,t}}{\mu_{a,t}+\mu_{b,t}} \\ \\ & a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ & l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ & \mu_{a,t} = l_{a,t-1} \\ \\ & b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ & l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ & \mu_{b,t} = l_{b,t-1} \\ \end{aligned}, \tag{14.14} \end{equation}\] where all the parameters have already been defined in previous subsections. More advanced models can be constructing for \(a_t\) and \(b_t\) by specifying the ETS models for each part and / or adding explanatory variables. The construction of this model is done via the following set of equations: \[\begin{equation} \begin{aligned} & e_{a,t} = \frac{u_t}{1-u_t} -1 \\ & \hat{a}_t = \hat{l}_{a,t-1} \\ & \hat{l}_{a,t} = \hat{l}_{a,t-1}( 1 + \alpha_{a} e_{a,t}) \\ & e_{b,t} = \frac{1-u_t}{u_t} -1 \\ & \hat{b}_t = \hat{l}_{b,t-1} \\ & \hat{l}_{b,t} = \hat{l}_{b,t-1}( 1 + \alpha_{b} e_{b,t}) \end{aligned} . \tag{14.15} \end{equation}\]

There is a separate function for the oETS\(_G\) model, called `oesg()`

. It has twice more parameters than `oes()`

, because it allows fine tuning of the models for the both variables \(a_t\) and \(b_t\). This gives an additional flexibility. For example, here is how we can use ETS(M,N,N) for the \(a_t\) and ETS(A,A,N) for the \(b_t\), resulting in oETS\(_G\)(M,N,N)(A,A,N):

```
## Occurrence state space model estimated: General
## Underlying ETS model: oETS[G](MNN)(AAN)
##
## Sample size: 110
## Number of estimated parameters: 6
## Number of degrees of freedom: 104
## Information criteria:
## AIC AICc BIC BICc
## 110.9301 111.7456 127.1330 129.0497
```

We can also analyse separately models for \(a_t\) and \(b_t\). Here is, for example, the model A:

```
## Occurrence state space model estimated: Odds ratio
## Underlying ETS model: oETS[G](MNN)_A
## Smoothing parameters:
## level
## 0
## Vector of initials:
## level
## 2.0676
##
## Error standard deviation: 1.6996
## Sample size: 110
## Number of estimated parameters: 2
## Number of degrees of freedom: 108
## Information criteria:
## AIC AICc BIC BICc
## 102.9301 103.0422 108.3310 108.5946
```

The experiments that I have done so far show that oETS\(_G\) very seldomly brings improvements in comparison with oETS\(_O\) or oETS\(_I\) in terms of forecasting accuracy. Besides, selecting models for each of the parts is a challenging task. So, this model is theoretically nice, being more general than the other oETS models, but is not very practical. Still it is useful because we can introduce different oETS model by restricting \(a_t\) and \(b_t\). For example, we can get:

- oETS\(_F\), when \(\mu_{a,t} = \text{const}\), \(\mu_{b,t} = \text{const}\) for all \(t\);
- oETS\(_O\), when \(\mu_{b,t} = 1\) for all \(t\);
- oETS\(_I\), when \(\mu_{a,t} = 1\) for all \(t\);
- oETS\(_D\), when \(\mu_{a,t} + \mu_{b,t} = 1\), \(\mu_{a,t} \leq 1\) for all \(t\) (discussed in the next subsection);
- oETS\(_G\), when there are no restrictions.

### 14.1.5 Direct probability model, oETS\(_D\)

The last model in the family of oETS is the “Direct probability”. It appears, when the following restriction is imposed on the oETS\(_G\): \[\begin{equation} \mu_{a,t} + \mu_{b,t} = 1, \mu_{a,t} \in [0, 1] . \tag{14.16} \end{equation}\] This restriction is inspired by the mechanism for the probability update proposed by Teunter, Syntetos, and Babai (2011) (TSB method). In their paper they use SES to model the time varying probability of occurrence. Based on this idea and the restriction (14.16) we can formulate oETS\(_D\)(M,N,N) model, which will underly the occurrence part of the TSB method: \[\begin{equation} \begin{aligned} & o_t \sim \text{Bernoulli} \left(\mu_{a,t} \right) \\ & a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ & l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ & \mu_{a,t} = \min(l_{a,t-1}, 1) \end{aligned}. \tag{14.17} \end{equation}\] There is also an option with the additive error for the occurrence part (also underlying TSB), which has a different, more complicated form: \[\begin{equation} \begin{aligned} & o_t \sim \text{Bernoulli} \left(\mu_{a,t} \right) \\ & a_t = l_{a,t-1} + \epsilon_{a,t} \\ & l_{a,t} = l_{a,t-1} + \alpha_{a} \epsilon_{a,t} \\ & \mu_{a,t} = \max \left( \min(l_{a,t-1}, 1), 0 \right) \end{aligned}. \tag{14.18} \end{equation}\]

The estimation of the oETS\(_D\)(M,N,M) model can be done using the following set of equations: \[\begin{equation} \begin{aligned} & \hat{\mu}_{a,t} = \hat{l}_{a,t-1} \\ & \hat{l}_{a,t} = \hat{l}_{a,t-1}( 1 + \hat{\alpha}_{a} e_{a,t}) \end{aligned}, \tag{14.19} \end{equation}\] where \[\begin{equation} e_{a,t} = \frac{o_t (1 - 2 \kappa) + \kappa - \hat{\mu}_{a,t}}{\hat{\mu}_{a,t}}, \tag{14.20} \end{equation}\] and \(\kappa\) is a very small number (for example, \(\kappa = 10^{-10}\)), needed only in order to make the model estimable. The estimate of the error term in case of the additive model is much simpler and does not need any specific tricks to work: \[\begin{equation} e_{a,t} = o_t - \hat{\mu}_{a,t} , \tag{14.21} \end{equation}\] which is directly related to TSB method. Note that equation (14.19) does not contain \(\min\) function, because the estimated error (14.20) will always guarantee that the level will lie between 0 and 1 as long as the smoothing parameter lies in the [0, 1] region (which is the conventional assumption for both ETS(A,N,N) and ETS(M,N,N) models). This also applies for the oETS\(_D\)(A,N,N) model, where the \(\max\) and \(\min\) functions can be dropped as long as the smoothing parameter lies in [0,1].

An important feature of this model is that it allows probability to become either 0 or 1, thus implying either that there is no demand on the product at all or that the demand on product has become regular. No other oETS model permits that - they assume that probability might become very close to bounds, but can never reach them.

When it comes to initialising the oETS\(_D\) model, the values are calculated directly from the data without any additional transformations.

Here’s an example of the application of the oETS\(_D\)(M,M,N) to the same artificial data:

```
## Occurrence state space model estimated: Direct probability
## Underlying ETS model: oETS[D](MMN)
## Smoothing parameters:
## level trend
## 0 0
## Vector of initials:
## level trend
## 0.2730 1.0163
##
## Error standard deviation: 0.8678
## Sample size: 110
## Number of estimated parameters: 4
## Number of degrees of freedom: 106
## Information criteria:
## AIC AICc BIC BICc
## 102.5241 102.9050 113.3260 114.2213
```

The empirical analysis I have done so far on different datasets shows that oETS\(_D\) model works efficiently in many cases and produces accurate forecasts. So, if you are unsure, which of the oETS models to choose for your intermittent data, I would recommend starting with oETS\(_D\).

### References

Svetunkov, Ivan, and John Boylan. 2019a. “Multiplicative state-space models for intermittent time series.” Department of Management Science, Lancaster University. https://doi.org/10.13140/RG.2.2.35897.06242.

Teunter, Ruud H, Aris A Syntetos, and M. Zied Babai. 2011. “Intermittent demand: Linking forecasting to inventory obsolescence.” *European Journal of Operational Research* 214 (3): 606–15. https://doi.org/10.1016/j.ejor.2011.05.018.