# Multistep loss functions: Trace MSE

As we discussed last time, there are two possible strategies in forecasting: recursive and direct. The latter aligns with the estimation of a model using a so-called multistep loss function, such as Mean Squared Error for h-steps-ahead forecast (MSEh). But this is not the only loss function that can be efficiently used for model estimation. Let’s discuss another popular option.

But before that, let’s take a step back to recap what we are talking about. All the multistep losses imply that we fit the model to the data in the conventional way and then produce recursively 1 to h-steps-ahead point forecasts from each in-sample observation, from the very first to the very last one. We can then calculate the forecast errors and collect them in a matrix with observations in rows and horizon in columns, as shown in the image below, generated using the rmultistep() function from the smooth package in R:

An example of a matrix of multistep forecast errors

After that, we can calculate any of the multistep loss functions. MSEh, for example, would simply be the mean of squared errors in the last column of that matrix.

One of the most straightforward modifications of MSEh is a loss function that can be called “Trace MSE”, which is the sum of MSEs of each of the columns in that matrix. It has some advantages and disadvantages in comparison with MSEh. Here are some:

• Because we sum up MSEs for different horizons, those closer to h will tend to be higher than those close to 1, simply because typically, with an increase of the horizon, uncertainty increases as well.
• The previous point means that the model estimated via TMSE will care less about short-term forecasts and will focus more on longer ones.
• But at least it will not be as myopic as a model estimated with a specific MSEh.
• You do not need to estimate h models; you can estimate just one, and it will be optimized for the entire horizon from 1 to h.
• This means that you save on computations, making the estimation and forecasting roughly h times faster than in the case of MSEh.
• Kourentzes et al. (2019) showed that TMSE slightly outperformed MSE1 and MSEh. In fact, in one of the early versions of that paper, Kourentzes & Trapero showed how well TMSE performs in the example of solar irradiation forecasting with ETS.
• TMSE imposes shrinkage on parameters of dynamic models, which makes them less reactive and avoids overfitting.
• But the shrinkage is not as strong as in the case of MSEh.

This is discussed in the paper I wrote together with Nikolaos Kourentzes and Rebecca Killick

Some examples of application of TMSE are provided in Section 11.3 of ADAM.

Also, Peter Laurinec did an independent exploration of multistep losses and wrote this nice post.