Model vs Method – why should we care?

Image of a model discussing a method

Image above depicts a fashion model making a presentation about a forecasting method. I like the forecast for the final period in that image…

Over the last few years, I’ve seen phrases like “LightGBM model” or “Neural Network model” on LinkedIn many times, and the statistician in me shivers every time. So, I figured it’s time to discuss the difference between a model and a method.

Some of you might remember that I wrote a post on this topic a few years ago. But it seems it is worth revisiting.

John Boylan and I came up with the following definitions in our paper:

  • A forecasting model is a mathematical representation of a real phenomenon with a complete specification of distribution and parameters;
  • A forecasting method is a mathematical procedure that generates point and/or interval forecasts, with or without a forecasting model.

If these sound too technical, here’s a simpler explanation:

  • A forecasting method is a way of generating forecasts;
  • A forecasting model is a way to describe the assumed structure of a real phenomenon.

The key difference? A method focuses on producing something specific (e.g., point forecasts) with minimal assumptions, while a model relies on assumptions but can do much more:

  1. Rigorous estimation. Models can be estimated in ways that ensure their parameters are efficient and consistent.
  2. Model selection using information criteria. A powerful approach that saves computational time and typically produces reasonable forecasts.
  3. Predictive distribution. Models can generate moments (mean, variance, skewness) and quantiles, capturing uncertainty around future values.
  4. Confidence intervals for parameters. While not crucial for forecasting, this is useful in other areas to quantify uncertainty.
  5. Extendibility. Additional variables and components can be easily incorporated in a model.

All of this comes at a price of making assumptions about the reality. If the assumptions don’t hold, the model won’t perform well. It might still be useful, but the risk of error increases. For example, you can apply a Random Walk model to purely random data, but you shouldn’t expect it to work well.

Examples

  1. A forecasting method: Naïve, defined by the simple equation:
    \( F_t = A _{t-1} \)
    This method is easy to explain, hard to break, and provides point forecasts, but nothing more.
  2. A forecasting model: Random Walk, which underlies the Naïve method:
    \( A_t = A_{t-1} + \epsilon_t \)
    where \( \epsilon_t \) follows some distribution with zero mean and fixed variance. The Random Walk model has all the properties described above.

In some cases, you can derive models underlying the methods. In my opinion, this typically enhances the latter, making them more powerful due to the reasons explained above. What is interesting about this general connection is that if we can identify a model underlying a method, we can do much more with it.

For example, when estimating a quantile regression, we typically minimize a pinball loss function, which gives us a method for generating quantiles. However, if we estimate the same linear regression model using likelihood, assuming that the error term follows the Asymmetric Laplace distribution, we arrive at exactly the same parameter estimates as in quantile regression. But now, we also gain additional benefits, such as model selection, predictive distribution, and confidence intervals for parameters – features outlined in the previous post. In a way, these benefits come “for free”, although at the cost of making explicit assumptions about the model. That said, I’d argue that assumptions exist in quantile regression anyway – they’re just not stated explicitly.

And here we finally come to the ML approaches. According to the definitions we discussed earlier, Decision Trees, k-Nearest Neighbors, Artificial Neural Networks (ANNs) and other ML approaches are not forecasting models. They do not attempt to capture the underlying structure of the data. Instead, they focus on identifying nonlinear patterns via engineered features to produce point forecasts. In other words, they are methods, not models.

This doesn’t make them inferior. Their strength lies in their flexibility, precisely because they don’t impose strong assumptions. However, treating them as forecasting models can lead to potential issues.

For example, plugging LightGBM’s point forecasts into a probability distribution doesn’t magically turn it into a model. It simply makes it a method that now generates quantiles, but without a solid theoretical foundation for why a specific distribution is chosen or used in a particular way.

Another example is model selection using information criteria, which is meaningless for ML approaches. Why? Because information criteria rely on the assumption that the model is estimated in a specific way (e.g., via maximum likelihood estimation), ensuring parameter consistency and model identifiability. However, some ML methods, such as ANNs, are fundamentally unidentifiable, as different architectures can produce the same output. So, the information criteria become meaningless in this setting.

So next time you see the term model, take a moment to consider whether it’s used correctly and whether it actually means what the author thinks.

Leave a Reply