Archives estimators - Open Forecasting

Multistep loss functions: Geometric Trace MSE

Ivan Svetunkov — Tue, 04 Jun 2024 09:05:56 +0000

While there is a lot to say about multistep losses, I’ve decided to write the final post on one of them and leave the topic alone for a while. Here it goes.

Last time, we discussed MSEh and TMSE, and I mentioned that both of them impose shrinkage and have some advantages and disadvantages. One of the main advantages of TMSE was in reducing computational time in comparison with MSEh: you just fit one model with it instead of doing it h times. However, the downside of TMSE is that it averages things out, and we end up with model parameters that minimize the h-steps-ahead forecast error to a much larger extent than those that are close to the one-step-ahead. For example, if the one-step-ahead MSE was 500, while the six-steps-ahead MSE was 3000, the impact of the latter in TMSE would be six times higher than that of the former, and the estimator would prioritize the minimization of the longer horizon one.

A more balanced version of this was introduced in our paper and was called “Geometric Trace MSE” (GTMSE). The main idea of GTMSE is to take the geometric mean or, equivalently, the sum of logarithms of MSEh instead of taking the arithmetic mean. Because of that, the impact of MSEh on the loss becomes comparable with the effect of MSE1, and the model performs well throughout the whole horizon from 1 to h. For the same example of MSEs as above, the logarithm of 500 is approximately 2.7, while the logarithm of 3000 is 3.5. The difference between the two is much smaller, reducing the impact of the long-term forecast uncertainty. As a result, GTMSE has the following features:

It imposes shrinkage on models parameters.
The strength of shrinkage is proportional to the forecast horizon.
But it is much milder than in case of MSEh or TMSE.
It leads to more balanced forecasts, performing well on average across the whole horizon.

In that paper, we did extensive simulations to see how different estimators behave, and we found that:

If an analyst is interested in parameters of models, they should stick with the conventional loss functions (based on one-step-ahead forecast error) because the multistep ones tend to produce biased estimates of parameters.
On the other hand, multistep losses kick off the redundant parameters faster than the conventional one, so there might be a benefit in the case of overparameterized models.
At the same time, if forecasting is of the main interest, then multistep losses might bring benefits, especially on larger samples.

ETS(A,A,A) estimated using different loss functions applied to the data with multiplicative seasonality

The image above shows an example from our paper, where we applied the additive model to the data, which exhibits apparent multiplicative seasonality. Despite that, we can see that multistep losses did a much better job than the conventional MSE, compensating for the misspecification.

Message Multistep loss functions: Geometric Trace MSE first appeared on Open Forecasting.

Multistep loss functions: Trace MSE

Ivan Svetunkov — Sat, 01 Jun 2024 11:29:12 +0000

As we discussed last time, there are two possible strategies in forecasting: recursive and direct. The latter aligns with the estimation of a model using a so-called multistep loss function, such as Mean Squared Error for h-steps-ahead forecast (MSEh). But this is not the only loss function that can be efficiently used for model estimation. Let’s discuss another popular option.

But before that, let’s take a step back to recap what we are talking about. All the multistep losses imply that we fit the model to the data in the conventional way and then produce recursively 1 to h-steps-ahead point forecasts from each in-sample observation, from the very first to the very last one. We can then calculate the forecast errors and collect them in a matrix with observations in rows and horizon in columns, as shown in the image below, generated using the rmultistep() function from the smooth package in R:

An example of a matrix of multistep forecast errors

After that, we can calculate any of the multistep loss functions. MSEh, for example, would simply be the mean of squared errors in the last column of that matrix.

One of the most straightforward modifications of MSEh is a loss function that can be called “Trace MSE”, which is the sum of MSEs of each of the columns in that matrix. It has some advantages and disadvantages in comparison with MSEh. Here are some:

Because we sum up MSEs for different horizons, those closer to h will tend to be higher than those close to 1, simply because typically, with an increase of the horizon, uncertainty increases as well.
The previous point means that the model estimated via TMSE will care less about short-term forecasts and will focus more on longer ones.
But at least it will not be as myopic as a model estimated with a specific MSEh.
You do not need to estimate h models; you can estimate just one, and it will be optimized for the entire horizon from 1 to h.
This means that you save on computations, making the estimation and forecasting roughly h times faster than in the case of MSEh.
Kourentzes et al. (2019) showed that TMSE slightly outperformed MSE1 and MSEh. In fact, in one of the early versions of that paper, Kourentzes & Trapero showed how well TMSE performs in the example of solar irradiation forecasting with ETS.
TMSE imposes shrinkage on parameters of dynamic models, which makes them less reactive and avoids overfitting.
But the shrinkage is not as strong as in the case of MSEh.

This is discussed in the paper I wrote together with Nikolaos Kourentzes and Rebecca Killick

Some examples of application of TMSE are provided in Section 11.3 of ADAM.

Also, Peter Laurinec did an independent exploration of multistep losses and wrote this nice post.

Message Multistep loss functions: Trace MSE first appeared on Open Forecasting.

Recursive vs Direct Forecasting Strategy

Ivan Svetunkov — Sat, 25 May 2024 15:03:36 +0000

Have you heard about the recursive vs direct forecasts? There’s literature about them in the areas of both ML and statistics. What’s so special about them? Here is a short post.

The term “recursive” forecasting refers to the approach, when we produce one-step-ahead forecast first, then use it to produce two-steps-ahead, three-steps-ahead, and so on. This process is iterative, fitting the model to the data based on one-step-ahead forecasts, starting from the first observation to the last in the sample. This is the default approach for all the standard dynamic models for forecasting, such as ARIMA or ETS.

The “direct” forecasting means producing a specific h-steps-ahead forecast (e.g., 12 months ahead), skipping intermediate steps. To do this, when fitting the model, we calculate the error between the one-step-ahead forecast and the actual value h steps ahead. This changes how we estimate the model, as our loss function is now based on the h-steps-ahead forecast error, and our one-step-ahead forecast starts acting as the h-steps-ahead one. Because of that, our one-step-ahead forecast now acts as the h-steps-ahead one. This way we don’t need to produce forecasts recursively, but, if we need all forecasts between 1 and h steps ahead, we must fit h models.

Both strategies are shown in the following image:

Recursive vs Direct forecasting strategies

The literature tells us that the direct forecasting strategy is equivalent to the so called multistep ahead loss function in model estimation (e.g. Chevillon, 2007). The standard “direct” forecasting strategy will give the same results as if we apply ARIMA/ETS to the data, produce h steps ahead recursive forecasts in-sample, starting from the first observation till the very last, and then minimise the Mean Squared h-steps-ahead forecast error (MSEh). This strategy has some advantages and disadvantages in comparison with the conventional one-step-ahead (see the introduction of our paper):

1. The specific h-steps-ahead forecast tends to be more accurate than in case of the standard estimation methods;
2. Although some papers show this isn’t universally true;
3. Parameter estimates tend to be less efficient than with one-step-ahead losses;
4. It’s more computationally expensive than standard estimators, especially for multiple-step forecasts.

So, there are accuracy benefits, but they come with a computational cost. Moreover, Kourentzes et al. (2020) found that the forecasting accuracy of MSEh was higher than the one of conventional loss functions, but this didn’t translate to better inventory performance.

Still, it wasn’t clear why this strategy is better, and we showed that applying MSEh to a dynamic model regularises its parameters. In ETS, this leads to parameters shrinkage toward zero proportionally to the forecast horizon used in the loss, making models more conservative and “slow.”

This is also discussed in Section 11.3 of ADAM.

Message Recursive vs Direct Forecasting Strategy first appeared on Open Forecasting.