While there is a lot to say about multistep losses, I’ve decided to write the final post on one of them and leave the topic alone for a while. Here it goes.
Last time, we discussed MSEh and TMSE, and I mentioned that both of them impose shrinkage and have some advantages and disadvantages. One of the main advantages of TMSE was in reducing computational time in comparison with MSEh: you just fit one model with it instead of doing it h times. However, the downside of TMSE is that it averages things out, and we end up with model parameters that minimize the h-steps-ahead forecast error to a much larger extent than those that are close to the one-step-ahead. For example, if the one-step-ahead MSE was 500, while the six-steps-ahead MSE was 3000, the impact of the latter in TMSE would be six times higher than that of the former, and the estimator would prioritize the minimization of the longer horizon one.
A more balanced version of this was introduced in our paper and was called “Geometric Trace MSE” (GTMSE). The main idea of GTMSE is to take the geometric mean or, equivalently, the sum of logarithms of MSEh instead of taking the arithmetic mean. Because of that, the impact of MSEh on the loss becomes comparable with the effect of MSE1, and the model performs well throughout the whole horizon from 1 to h. For the same example of MSEs as above, the logarithm of 500 is approximately 2.7, while the logarithm of 3000 is 3.5. The difference between the two is much smaller, reducing the impact of the long-term forecast uncertainty. As a result, GTMSE has the following features:
- It imposes shrinkage on models parameters.
- The strength of shrinkage is proportional to the forecast horizon.
- But it is much milder than in case of MSEh or TMSE.
- It leads to more balanced forecasts, performing well on average across the whole horizon.
In that paper, we did extensive simulations to see how different estimators behave, and we found that:
- If an analyst is interested in parameters of models, they should stick with the conventional loss functions (based on one-step-ahead forecast error) because the multistep ones tend to produce biased estimates of parameters.
- On the other hand, multistep losses kick off the redundant parameters faster than the conventional one, so there might be a benefit in the case of overparameterized models.
- At the same time, if forecasting is of the main interest, then multistep losses might bring benefits, especially on larger samples.
The image above shows an example from our paper, where we applied the additive model to the data, which exhibits apparent multiplicative seasonality. Despite that, we can see that multistep losses did a much better job than the conventional MSE, compensating for the misspecification.