1.2 Forecasting principles
If you have decided that you need to forecast something, it makes sense to keep several important forecasting principles in mind.
First, as discussed earlier, you need to understand why the forecast is required, how it will be used, and by whom. Answers to these questions will guide you in deciding what technique to use, how specifically to forecast, and what should be reported. For example, if a client does not know machine learning, it might be unwise to use Neural Networks for forecasting – the client will potentially not trust the technique and thus will not trust the forecasts, switching to simpler methods. If the final decision is to order a number of units, it would be more reasonable to produce cumulative forecasts over the lead time (time between the order and product delivery) and form safety stock based on the model and assumed distribution.
When you understand what to forecast and how, the second principle comes into play: select the relevant error measure. You need to decide how to measure the accuracy of forecasting methods, keeping in mind that accuracy needs to be as close to the final decision as possible. For example, if you need to decide the number of nurses for a specific day in the A&E department based on the patients’ attendance, then it would be more reasonable to compare models in terms of their quantile performance (see Section 2.2) rather than expectation or median. Thus, it would be more appropriate to calculate pinball loss instead of MAE or RMSE (see details in Chapter 2).
Third, you should always test your models on a sample of data not seen by them. Train your model on one part of a sample (called train set or in-sample) and test it on another one (called test set or holdout sample). This way, you can have some guarantees that the model will not overfit the data and that it will be reasonable when you need to produce a final forecast. Yes, there are cases when you do not have enough data to do that. All you can do in these situations is use simpler, robust models (for example, damped trend exponential smoothing by Roberts, 1982; and Gardner and McKenzie, 1985; or Theta by Assimakopoulos and Nikolopoulos, 2000) and to use judgment in deciding whether the final forecasts are reasonable or not. But in all the other cases, you should test the model on the data it is unaware of. The recommended approach, in this case, is rolling origin, discussed in more detail in Section 2.4.
Fourth, the forecast horizon should be aligned with specific decisions in practice. If you need predictions for a week ahead, there is no need to produce forecasts for the next 52 weeks. If you do that then on the one hand, this will be costly and excessive, and on the other hand, the accuracy measurement will not align with the company’s needs. The related issue is the test set (or holdout) size selection. There is no unique guideline for this, but it should not be shorter than the forecasting horizon and preferrably it should align with the specific horizon coming from managerial decisions.
Fifth, the time series aggregation level should be as close to the specific decisions as possible. There is no need to produce forecasts on an hourly level for the next week (168 hours ahead) if the decision is based on the order of a product for the whole week. We would not need such a granularity of data for the decision; aggregating the actual values to the weekly level and then applying models will do the trick. Otherwise, we would be wasting a lot of time making complicated models work on an hourly level.
Sixth, you need to have benchmark models. Always compare forecasts from your favourite approach with those from Naïve, global average, and/or regression (they are discussed in Section 3.3) – depending on what you deal with specifically. If your fancy Neural Network performs worse than Naïve, it does not bring value and should not be used in practice. Comparing one Neural Network with another is also not a good idea because Simple Exponential Smoothing (see Section 3.4), being a much simpler model, might beat both networks, and you would never find out about that. If possible, also compare forecasts from the proposed approach with forecasts of other well-established benchmarks, such as ETS (Hyndman et al., 2008), ARIMA (Box and Jenkins, 1976) and Theta (Assimakopoulos and Nikolopoulos, 2000).
Finally, when comparing forecasts from different models, you might end up with several very similar performing approaches. If the difference between them is not significant, then the general recommendation is to select the faster and simpler one. This is because simpler models are more difficult to break, and those that work faster are more attractive in practice due to reduced energy consumption (save the planet and stop global warming! Dhar, 1999).
These principles do not guarantee that you will end up with the most accurate forecasts, but at least you will not end up with unreasonable ones.