1.2 Forecasting principles
If you have decided that you need to forecast something, then it makes sense to keep several important forecasting principles in mind.
First, as discussed earlier, you need to understand why the forecast is needed, how it will be used and by whom. Answers to these questions will guide you in deciding, what technique to use, how specifically to do forecasting and what should be reported. For example, if a client does not know machine learning, it might be unwise to use Neural Networks for forecasting - the client will not trust the technique and thus will not trust the forecasts, switching to simpler methods. If the final decision is to order some number of units, then it would be more reasonable to produce cumulative forecasts over the lead time (time between the order and product delivery) and form safety stock based on the model and assumed distribution.
When you have an understanding of what to forecast and how, the second principle comes into play. Select the relevant error measure. You need to decide how to measure the accuracy of forecasting methods, keeping in mind that accuracy needs to be as close to the final decision, as possible. For example, if you need to decide the number of nurses for a specific day in the A&E department based on the patients attendance, then it would be more reasonable to compare models in terms of their quantile performance (see Section 2.2) rather than expectation or median. Thus, it would be more appropriate to calculate pinball loss instead of MAE or RMSE (see details in Section 2).
Third, you should always test your models on a sample of data not seen by them. Train your model on one part of a sample (train set or in-sample) and test it on another one (test set or holdout sample). This way you can have some guarantees that the model will not overfit the data and that when you need to produce a final forecast, it will be reasonable. Yes, there are cases, when you do not have enough data to do that. All you can do in these situations, is use simpler, robust models (for example, such as damped trend exponential smoothing by Roberts (1982) and Gardner and McKenzie (1985) or Theta by Assimakopoulos and Nikolopoulos (2000)) and to use judgment in deciding, whether the final forecasts are reasonable or not. But in all the other cases, you should test model on the data they are not aware of. The recommended approach in this case is rolling origin, discussed in more detail in Section 2.4.
Fourth, the forecast horizon should be aligned with specific decisions in practice. If you need predictions for a week ahead there is no need to produce forecasts for the next 52 weeks. On one hand this is costly and excessive, on the other hand the measured accuracy will not align with the needs of the company. The related issues is the test set (or holdout) size selection. There is no unique guideline for this, but it should not be shorter than the forecasting horizon.
Fifth, the time series aggregation level should be as close to the specific decisions as possible: there is no need to produce forecasts on hourly level for the next week (168 hours ahead) if the decision is based on the order of a product for that period of time - we would not need such a granularity of data for the decision, data aggregated to weekly level will do the trick, but we would waste a lot of time making complicated models work on hourly level.
Sixth, you need to have benchmark models. Always compare forecasts from your favourite approach with those from Naïve, global average and / or regression - depending on what you deal with specifically. If your fancy Neural Network performs worse than Naïve, then it does not bring value and should not be used in practice. Comparing one Neural Network with another is also not a good idea, because Simple Exponential Smoothing (see Section 4.1), being much simpler model, might beat both networks, and you would never find out about that. If possible, also compare forecasts from the proposed approach with forecasts of other well established benchmarks, such as ETS (Hyndman et al., 2008), ARIMA (Box and Jenkins, 1976) and Theta (Assimakopoulos and Nikolopoulos, 2000).
Finally, when comparing forecasts from different models, you might end up with several very similar performing approaches. If the difference between them is not significant, then the general recommendation is to select the one that is faster and simpler. This is because simpler models are more difficult to break and those that work faster are more attractive in practice due to reduced energy consumption (save the planet and stop global warming! Dhar, 1999).
These principles do not guarantee that you will end up with the most accurate forecasts, but at least you will not end up with the unreasonable ones.