6.6 Alternative model formulation
There is a fundamental design flaw in the ETS with multiplicative components, which is not apparent at first. Consider the ETS(M,M,M) model, which we formulated in (6.4) as: \[\begin{equation*} \begin{aligned} y_{t} = & l_{t-1} b_{t-1} s_{t-m} (1 + \epsilon_t) \\ l_t = & l_{t-1} b_{t-1} (1 + \alpha \epsilon_t) \\ b_t = & b_{t-1} (1 + \beta \epsilon_t) \\ s_t = & s_{t-m} (1 + \gamma \epsilon_t) \end{aligned}. \end{equation*}\]
This model captures the effect of percentage changes in components on the final sales, as discuss in previous sections. Yet its error term has a linear effect on the states, and if the error term contains a substantial outlier, it will have a damaging effect on them. To see the problem better, consider the error term in the level equation, \(1+\alpha \epsilon_t\), which can also be reformulated as (for convenience): \[\begin{equation} 1 + \alpha \epsilon_t = 1 - \alpha + \alpha (1 + \epsilon_t). \tag{6.22} \end{equation}\] Take an arbitrary value of the smoothing parameter, for example, \(\alpha=0.3\), which means that (6.22) becomes equal to: \[\begin{equation*} 1 + \alpha \epsilon_t = 0.7 + 0.3 (1 + \epsilon_t). \end{equation*}\] Given that the error term \(1 + \epsilon_t\) should always be positive, the range of values of this error term becomes \((0.7, \infty)\), meaning that whatever error the model makes, the state will not decrease by more than 30% (i.e. \(1 - 0.7 = 0.3\)). However, its increase is not bounded by anything, and if we observe an outlier, for example, so that \(1+\epsilon_t=4\) (actual value is four times larger than the expected one), the update of state upwards will be \(0.7 + 0.3 \times 4 = 1.9\), 90% up in comparison with the previous value. Furthermore, this increase is linear: the higher the outlier is, the stronger the state reacts to it, without a chance to come back to the previous level as fast. This behaviour becomes especially harmful for the trend component, which can jump from being equal to 1 to 1.9 in just one observation. And the multiplicative trend with the slope of 1.9 is especially dangerous in forecasting, because it exhibits extremely explosive behaviour.
To make the model more balanced and less reactive to positive outliers, we need to reformulate it, for example, to: \[\begin{equation*} \begin{aligned} y_{t} = & l_{t-1} b_{t-1} s_{t-m} (1 + \epsilon_t) \\ l_t = & l_{t-1} b_{t-1} (1 + \epsilon_t)^\alpha \\ b_t = & b_{t-1} (1 + \epsilon_t)^\beta \\ s_t = & s_{t-m} (1 + \epsilon_t)^\gamma \end{aligned} . \tag{6.23} \end{equation*}\] This model is equivalent to (6.6) and is essentially the additive model applied to the data in logarithms. The effect of the error term on the states in this model is more balanced. If we take the same value of \(\alpha=0.3\), the range of values of the error term \((1 + \epsilon_t)^\alpha\) is \((0, \infty)\). Furthermore, because of the raising to a power of \(\alpha\), the effect of outliers on the states will be diminished with the increase of its value. Figure 6.1 shows the effect of errors on the states with different values of error.
More generally, this model can be expressed as:
\[\begin{equation} \begin{aligned} \log y_t = & \mathbf{w}^\prime \log(\mathbf{v}_{t-\boldsymbol{l}}) + \log(1 + \epsilon_{t}) \\ \log \mathbf{v}_{t} = & \mathbf{F} \log \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{g} \log(1 + \epsilon_t) \end{aligned}, \tag{6.24} \end{equation}\] using the same notations as before. Furthermore, in addition to the advantages discussed above, this model also has simpler recursive relation than the one in Subsection 6.2 (they will be similar to the ones discussed in Subsection 5.2), and has closed forms for the h-steps ahead expectation and variance, which can be used for the calculation of the multistep point forecasts and construction of the parametric prediction intervals.
Having said that, as of the 1st December 2024, this model is not yet implemented in the adam()
function from the smooth
package.