14.4 Model specification: Outliers

One of the important assumptions in forecasting and analytics is the correct specification of the model, which implies that there are no outliers in the model. They might appear for several reasons:

  1. We missed some important information (e.g. promotion) and did not include a respective variable in the model;
  2. There was an error in recordings of the data, i.e. a value of 2000 was recorded as 200;
  3. We did not miss anything predictable, we just dealt with a distribution with fat tails.

In any of these cases, outliers might impact estimates of parameters of our models. With ETS, this will lead to higher than needed smoothing parameters, which leads to wider prediction intervals and potentially biased forecasts. In the case of ARIMA, the mechanism is more complicated, leading to widened intervals and biased forecasts. Finally, in regression, they might lead to biased estimates of parameters. So, it is important to identify outliers and deal with them.

14.4.1 Outliers detection

One of the simplest ways of identifying outliers is based on distributional assumptions. For example, if we assume that our data follows the normal distribution, we would expect 95% of observations to lie inside the bounds with approximately \(\pm 1.96\sigma\) and 99.8% of them to lie inside the \(\pm 3.09 \sigma\). Sometimes these values are substituted by heuristics “values lying inside 2 / 3 sigmas”, which is not precise and works only for Normal distribution. Still, based on this, we could flag the values outside these bounds and investigate them to see if any of them are indeed outliers.

Given that the ADAM framework supports different distributions, the heuristics mentioned above is inappropriate. We need to get proper quantiles for each of the assumed distributions. Luckily, this is not difficult because the quantile functions for all the distributions supported by ADAM either have analytical forms or can be obtained numerically.

Here is an example in R with the same multiplicative ETSX model and the standardised residuals vs fitted values with the 95% bounds (Figure 14.12):

Standardised residuals vs Fitted for the pure multiplicative ETSX model.

Figure 14.12: Standardised residuals vs Fitted for the pure multiplicative ETSX model.

Remark. In the case of \(\mathcal{IG}\), \(\Gamma\) and \(\mathrm{log}\mathcal{N}\), the function will plot \(\log u_t\) to make the plot more readable.

The plot in Figure 14.12 demonstrates that there are outliers, some of which are further away from the bounds. Although the amount of outliers is not large, it would make sense to investigate why they happened.

Given that we deal with time series, plotting residuals vs time is also sometimes helpful (Figure 14.13):

Standardised residuals vs Time for the pure multiplicative ETSX model.

Figure 14.13: Standardised residuals vs Time for the pure multiplicative ETSX model.

We see in Figure 14.13 that there is no specific pattern in the outliers, they happen randomly, so they appear not because of the omitted variables or wrong transformations. We have nine observations lying outside the bounds, which given that the sample size of 192 observations, means that the 95% interval contains \(\frac{192-9}{192} \times 100 \% \approx 95.3 \%\) of observations, which is close to the nominal value.

In some cases, the outliers might impact the scale of distribution and will lead to wrong standardised residuals, distorting the picture. This is where studentised residuals come into play. They are calculated similarly to the standardised ones, but the scale of distribution is recalculated for each observation by considering errors on all but the current observation. So, in a general case, this is an iterative procedure that involves looking through \(t=\{1,\dots,T\}\) and that should, in theory, guarantee that the real outliers do not impact the scale of distribution. This procedure is simplified for the normal distribution and has an analytical solution. We do not discuss it in the context of ADAM. Here is how they can be analysed in R:

Studentised residuals analysis for the pure multiplicative ETSX model.

Figure 14.14: Studentised residuals analysis for the pure multiplicative ETSX model.

In many cases (ours included), the standardised and studentised residuals will look very alike, having a similar amount of outliers. But in some cases of extreme outliers, they might differ, and the latter might show outliers better than the former.

Given the situation with outliers in our case, we could investigate when they happen in the original data to understand better whether they need to be taken care of. But instead of manually recording which of the observations lie beyond the bounds, we can get their ids via the outlierdummy method from the package greybox, which extracts either standardised or studentised residuals and flags those observations that lie outside the constructed interval, automatically creating dummy variables for these observations. Here is how it works:

The method returns several objects (see the documentation for details), including the ids of outliers:

## [1]  14  33  61  66  86  92 109 156 177

These ids can be used to produce additional plots. For example:

Actuals over time with points corresponding to outliers of the pure multiplicative ETSX model.

Figure 14.15: Actuals over time with points corresponding to outliers of the pure multiplicative ETSX model.

We cannot see any peculiarities in the appearance of outliers in Figure 14.15. They seem to happen at random. There might be some external factors, leading to those unexpected events (for example, the number of injuries being much lower than expected on observation 156, in November 1981), but investigation of these events is outside of the scope of this demonstration.

Remark. As a side note, in R, there are several methods for extracting residuals:

  • resid() or residuals() will extract either \(e_t\) or \(1+e_t\), depending on the distributional assumptions of the model;
  • rstandard() will extract the standardised residuals \(u_t\);
  • rstudent() will do the same for the studentised ones.

The smooth package also introduces the rmultistep function, which extracts multiple steps ahead in-sample forecast errors. We do not discuss this method here, but we will return to it in Section 14.7.1.

14.4.2 Dealing with outliers

Based on the output of the outlierdummy() method from the previous example, we can construct a model with explanatory variables to interpolate the outliers and neglect their impact on the model:

In order to decide, whether the dummy variables help or not, we can use information criteria, comparing the two models:

##         ETSX ETSXOutliers 
##     2406.366     2377.058

Comparing the two values above, we would conclude that adding dummies improves the model. However, this could be a mistake, given that we do not know the reasons behind most of them. In general, we should not include dummy variables for the outliers unless we know why they happened. If we do, we might overfit the data. Still, if we have good reasons for this, we could add explanatory variables for outliers in the function to remove their impact on the response variable:

## [1] 2403.929

While this model is worse than the one with all the outliers, it would have a better theoretical rationale than the model adamSeat06.

14.4.3 An automatic mechanism

A similar automated mechanism is implemented in the adam() function, which has the outliers parameter, defining what to do with them if there are any with the following three options:

  1. “ignore” – do nothing;
  2. “use” – create the model with explanatory variables including all of them, as shown in the previous subsection, and see if it is better than the simpler model in terms of an information criterion;
  3. “select” – create lags and leads of dummies from outlierdummy() and then select the dummies based on the explanatory variables selection mechanism (discussed in Section 15.3). Lags and leads are needed for cases when the effect of outlier is carried over to neighbouring observations.

Here how this works in our case:

## [1] 2401.044

This automated procedure will form a matrix that will include original variables and the outliers, their lags, and leads and then select those that minimise AICc in a stepwise process (discussed in Section 15.3). Given that this is an automated approach, it is prone to potential mistakes. It needs to be treated with care as it might select unnecessary dummy variables and lead to overfitting. I would recommend exploring the outliers manually when possible and not relying too much on the automated procedures.

14.4.4 Final remarks

Koehler et al. (2012) explored the impact of outliers on ETS performance in terms of forecasting accuracy. They found that if outliers happen at the end of the time series, it is important to take them into account in a model. If they happen much earlier, their impact on the final forecast will be negligible. Unfortunately, the authors did not explore the impact of outliers on the prediction intervals. Based on my experience, I can tell that the outliers typically impact the width of the interval rather than the point forecasts.

References

• Koehler, A.B., Snyder, R.D., Ord, J.K., Beaumont, A., 2012. A study of outliers in the exponential smoothing approach to forecasting. International Journal of Forecasting. 28, 477–484. https://doi.org/10.1016/j.ijforecast.2011.05.001