Archives Statistics - Open Forecast

Hans Levenbach’s classification scheme for trend/seasonal components

Ivan Svetunkov — Mon, 18 May 2026 08:01:28 +0000

Here is a curious idea: if we can somehow estimate the importance of trend/seasonal components for your data, you can use this in model building and forecasting. But how can we do this first step? Hans Levenbach has an answer with his simple EDA technique. Let me explain.

The core idea is simple and neat. For this example, I’ll use monthly data, like the time series in this image:

Series N2568 from the M3 dataset

You can see that the data has strong seasonality, and we can qualitatively say that capturing that seasonal component correctly will probably solve the main problem in capturing the structure. But how can we quantify this?

All you need to do is put the data in a “wide” format, with months in rows and years in columns. Then, as Hans proposed, run a two-way ANOVA with “month” and “year” to capture variability due to year (trend) and due to month (seasonality). Roughly, we take row/column means to get mean seasonal profiles and mean annual changes (trend), as in the following two images:

Seasonal profile of the data

Trend profile

The former has no trend, the latter has no seasonality, so they can be analysed separately. Then we calculate the sums of squares of these means from the global mean to estimate variation due to months (seasonality) and years (trend). We can also calculate the sum of squares of the irregular component (what is left), giving three elements that add up to the total sum of squares.

Next step is trivial and straightforward: calculate the shares of each component in the total sum of squares. For our example, using aov() in R and then computing the total:

Seasonal:  292,307,558
Trend:     176,308,365
Irregular:  33,618,630

Total:     502,234,552

So, the seasonal contribution is 292,307,558 / 502,234,552 ≈ 58.2%, the trend contribution is 35.1%, and the irregular component is 6.69%.

Why bother? This simple EDA technique tells you roughly what to focus in forecasting. In this example, capturing seasonality correctly is roughly 60% of the story, with trend being second in importance. Hans goes further in his derivations, see his LinkedIn post. He also analysed M3 results at some point, explaining why some methods performed better (trend dominated the data).

It is worth pointing out that this approach assumes that the seasonal component does not evolve over time, which is reasonable but not always correct. And the model behind this is essentially a regression with dummy variables for year and month. Nonetheless, it is a great starting point for EDA.

P.S. Hans Levenbach passed away on 7 April 2026. I wasn’t sure whether to write about it and what to write about him, but I had several nice discussions with him, and I have admired his approach to forecasting: first explore the data, then build a model. His passing is a loss for the forecasting community.

P.P.S. You can read a bit about him on the IIF website.

CMAF had a webinar with Hans a couple of years ago. We had technical issues, but he managed to explain his idea well.

Message Hans Levenbach’s classification scheme for trend/seasonal components first appeared on Open Forecast.

Teaching Statistics and Descriptive Analytics in the world of AI

Ivan Svetunkov — Wed, 07 Jan 2026 17:32:28 +0000

Teaching statistics as a flipped classroom with the help of AI? You heard that right! That’s exactly what I tried this year – and here are the results.

Attached to this post is the student evaluation score for the module. Yes, the number of responses is quite low (only 50% of the cohort), but it should still give a sense of how students perceived Statistics and Descriptive Analytics. Of course, this reflects only their impression – coursework submissions are yet to come – but it’s still an encouraging sign that some things worked well.

I’ve taught this module since 2018, first with Dave Worthington and later with Alisa Yusupova. Normally, I focused on the second half, covering regression through lectures and workshops. But this year, I took on the full module and realised I didn’t want to teach probability theory and statistics in the traditional way – long monologues in lectures followed by awkward silence in workshops. That format, I believe, no longer works. After all, students can always ask their favourite LLM to explain concepts they don’t understand. And some don’t even do that – they just ask to solve problems without understanding them. So, what can be done in this brave new world?

I don’t yet have a definitive answer – only the results of an experiment.

Lectures. This year, I used Google Notebook ML to prepare lecture materials. I provided my existing notes, slides, and relevant texts, then asked it to produce podcasts on specific topics. This took more time than expected, as I had to review the generated content, adjust prompts, and refine focus areas, listening to the podcasts over and over again. Once ready, I uploaded the materials on Moodle and asked students to listen beforehand. In class, we skipped formal lectures and instead had whiteboard & marker discussions. I asked questions, showed derivations, and encouraged debate. With a class of 26 students, it was possible to create much more interaction than in previous years.

Workshops. We still had problem-solving sessions, but I allowed (actually encouraged) students to use LLMs to solve tasks and explain why the solutions were correct. The aim was to emphasise reasoning and assumptions over simply obtaining the right number. This worked with mixed success, and I still need to think how it can be improved further.

Did it work overall?

I’m not entirely sure. Not all students engaged with the materials in advance, but those who did seemed to benefit and appreciated the approach. What I do know is that the “two-hour monologue while everyone tries not to fall asleep” format does not work any more. For universities, and for the (very!) expensive UK education, to remain relevant, we must innovate and rethink how we teach.

What would you change if you were teaching a technical subject at university in the era of AI? I’d love to hear your ideas.

Message Teaching Statistics and Descriptive Analytics in the world of AI first appeared on Open Forecast.

On randomness and uncertainty

Ivan Svetunkov — Mon, 28 Apr 2025 11:05:29 +0000

Everything is random! Your data, your model, its parameter estimates, the forecasts it produces, and even the minimum of the loss function you used. There is no such thing as a “deterministic” forecast – everything is stochastic!

Whenever you work with data, you are working with a sample from a population. In some cases, this is more apparent than in others. In my statistics lectures, I typically give the following example. Consider that we are interested in the average height of students at the university. I could ask every student at the lecture to tell me their height, take the average, and get a number. Is this number random? Yes, indeed. Why? Because if a student who was late for the lecture comes in, I would need to recalculate the average, and the number would change. The average that I get depends on who specifically I have in the sample and how many observations I have. It will vary more in smaller samples and become more stable in larger ones. But this example gives you an idea about the inherent uncertainty of any estimates we deal with.

In time series, the situation is somewhat similar: you are dealing with a sample of values that you have observed up until a specific moment. If, for example, you want to forecast daily admissions in the emergency department of a hospital and apply a model, its forecast will change when a new day comes and a new cohort of patients arrives. This is because your sample changes, and you receive new information about the demand.

So, the parameter estimates of a model you use will change when you get a new observation (e.g., a new record of product sales). Yes, if you estimate the model properly (e.g., using Least Squares), the parameter estimates won’t change substantially, but they will change nonetheless. And this would affect point forecasts and any other statistics produced by your model. Your standard errors, p-values, conditional means, prediction intervals, error measures, model ranking – everything will change with a new observation. In fact, if you do model selection, the structure of the model might change as well. For example, in the case of ETS, you might switch from a model without a trend to one with a trend. So, every time you estimate anything on a sample of data, you should keep in mind that it is random and will change if your sample changes or gets updated.

Why is that important? Because we need to understand this inherent uncertainty, and ideally, we should somehow take it into account. In forecasting, this means you should not draw conclusions based on one application of a model to a dataset. At the very least, you should perform a rolling origin evaluation. As Leonidas Tsaprounis says, “if you don’t roll the origin, you roll the dice”.

So, embrace the uncertainty and learn how to deal with it.

By the way, Kandrika Pritularga and I are holding a course on Demand Forecasting starting on 6th May. There is still time to sign up for it here.

Message On randomness and uncertainty first appeared on Open Forecast.

Why do zeroes happen? A model-based view on demand classification

Ivan Svetunkov — Thu, 20 Mar 2025 15:32:52 +0000

I presented our current work with Anna Sroginis during my visit of IÉSEG School of Management, Lille, France last week. It was great to see my colleague and friend Sarah Van der Auweraer, and I enjoyed the discussion we had with people in her group related to forecasting and intermittent demand. You can see details of the event here and find slides here.

Message Why do zeroes happen? A model-based view on demand classification first appeared on Open Forecast.

Model vs Method – why should we care?

Ivan Svetunkov — Tue, 04 Feb 2025 12:14:44 +0000

Image above depicts a fashion model making a presentation about a forecasting method. I like the forecast for the final period in that image…

Over the last few years, I’ve seen phrases like “LightGBM model” or “Neural Network model” on LinkedIn many times, and the statistician in me shivers every time. So, I figured it’s time to discuss the difference between a model and a method.

Some of you might remember that I wrote a post on this topic a few years ago. But it seems it is worth revisiting.

John Boylan and I came up with the following definitions in our paper:

A forecasting model is a mathematical representation of a real phenomenon with a complete specification of distribution and parameters;
A forecasting method is a mathematical procedure that generates point and/or interval forecasts, with or without a forecasting model.

If these sound too technical, here’s a simpler explanation:

A forecasting method is a way of generating forecasts;
A forecasting model is a way to describe the assumed structure of a real phenomenon.

The key difference? A method focuses on producing something specific (e.g., point forecasts) with minimal assumptions, while a model relies on assumptions but can do much more:

Rigorous estimation. Models can be constructed in ways that ensure their estimates of parameters are efficient and consistent.
Model selection using information criteria. A powerful approach that saves computational time and typically produces reasonable forecasts.
Predictive distribution. Models can generate moments (mean, variance, skewness) and quantiles, capturing uncertainty around future values.
Confidence intervals for parameters. While not crucial for forecasting, this is useful in other areas to quantify uncertainty.
Extendibility. Additional variables and components can be easily incorporated in a model.

All of this comes at a price of making assumptions about the reality. If the assumptions don’t hold, the model won’t perform well. It might still be useful, but the risk of error increases. For example, you can apply a Random Walk model to purely random data, but you shouldn’t expect it to work well.

Examples

A forecasting method: Naïve, defined by the simple equation:
\( F_t = A _{t-1} \)
This method is easy to explain, hard to break, and provides point forecasts, but nothing more.
A forecasting model: Random Walk, which underlies the Naïve method:
\( A_t = A_{t-1} + \epsilon_t \)
where \( \epsilon_t \) follows some distribution with zero mean and fixed variance. The Random Walk model has all the properties described above.

In some cases, you can derive models underlying the methods. In my opinion, this typically enhances the latter, making them more powerful due to the reasons explained above. What is interesting about this general connection is that if we can identify a model underlying a method, we can do much more with it.

For example, when estimating a quantile regression, we typically minimize a pinball loss function, which gives us a method for generating quantiles. However, if we estimate the same linear regression model using likelihood, assuming that the error term follows the Asymmetric Laplace distribution, we arrive at exactly the same parameter estimates as in quantile regression. But now, we also gain additional benefits, such as model selection, predictive distribution, and confidence intervals for parameters – features outlined in the previous post. In a way, these benefits come “for free”, although at the cost of making explicit assumptions about the model. That said, I’d argue that assumptions exist in quantile regression anyway – they’re just not stated explicitly.

And here we finally come to the ML approaches. According to the definitions we discussed earlier, Decision Trees, k-Nearest Neighbors, Artificial Neural Networks (ANNs) and other ML approaches are not forecasting models. They do not attempt to capture the underlying structure of the data. Instead, they focus on identifying nonlinear patterns via engineered features to produce point forecasts. In other words, they are methods, not models.

This doesn’t make them inferior. Their strength lies in their flexibility, precisely because they don’t impose strong assumptions. However, treating them as forecasting models can lead to potential issues.

For example, plugging LightGBM’s point forecasts into a probability distribution doesn’t magically turn it into a model. It simply makes it a method that now generates quantiles, but without a solid theoretical foundation for why a specific distribution is chosen or used in a particular way.

Another example is model selection using information criteria, which is meaningless for ML approaches. Why? Because information criteria rely on the assumption that the model is estimated in a specific way (e.g., via maximum likelihood estimation), ensuring parameter consistency and model identifiability. However, some ML methods, such as ANNs, are fundamentally unidentifiable, as different architectures can produce the same output. So, the information criteria become meaningless in this setting.

So next time you see the term model, take a moment to consider whether it’s used correctly and whether it actually means what the author thinks.

Message Model vs Method – why should we care? first appeared on Open Forecast.

There is no such thing as an “assumption-free approach”

Ivan Svetunkov — Tue, 07 Jan 2025 10:50:55 +0000

One thing that bothers me when I read posts on social media or papers in peer-reviewed journals is the claim that a proposed approach is “assumption-free.” In forecasting, this is never true. Such an approach is like a spherical unicorn in a vacuum (see image above). Here’s why.

Every model is a simplification of reality, meaning that it captures only a part of it. Simplifying implies that certain aspects of reality are irrelevant and can be ignored. For example, in forecasting, regardless of the approach used, we typically assume that the model captures the structure correctly, i.e. neither omitting important elements nor overfitting the data. Different approaches address this differently: statistical models do that explicitly, while a good ML approach seeks a balance between underfitting and overfitting, often in a non-linear way. When the structure is captured correctly, the forecast reflects the essential part of reality while ignoring small random fluctuations (see a post on structure vs. noise).

Depending on the assumptions we make, we can classify approaches as parametric, semiparametric, or nonparametric.

Parametric approaches assume that the model is correctly specified, its parameters are accurately estimated, and the chosen distribution is appropriate (often the normal one, though others can be used). In this case, we fully rely on the model. A classical example is the construction of conventional prediction intervals: the conditional expectation and variance are calculated and plugged into the normal distribution to derive the necessary quantiles for a specified confidence level. Specifically, in this case we assume that the model is correct, errors are uncorrelated and homoscedastic, and that they follow a normal distribution.

Semiparametric approaches relax some of these assumptions. For example, we might calculate statistics in a more robust manner or drop the assumption of a specific distribution. For example, instead of relying on textbook formulae, we could use in-sample multistep forecast errors to calculate conditional variances. This eliminates the need to assume uncorrelated and homoscedastic errors and allows for some flexibility in the model structure. However, in this example, we still rely on normality.

Nonparametric approaches avoid most of the above assumptions but come with their own hidden ones. For instance, the method proposed by Taylor & Bunn (1999) for constructing prediction intervals fits quantile regressions to in-sample multistep forecast errors. This method does not assume a correct model, well-behaved residuals, or normality. However, it does assume the appropriateness of the chosen quantile regression function (Spoiler: they used polynomial regression, but my experiments suggest that a power function is a more robust alternative).

You might think that nonparametric approaches, with fewer assumptions, should always be preferred. But that’s not necessarily the case. It is “horses for courses”: you should select the approach that best fits your specific situation. For example, when working with small samples, introducing some assumptions might be necessary to get meaningful estimates. A nonparametric approach, while powerful, might require more data than you have available.

Finally, there is no such thing as a “best” method for every situation. As is often the case in forecasting, you need to try different approaches and choose the one that works best. Even then, remember that forecasting always rests on a fundamental assumption: the future will resemble the past. And no fancy method can guarantee that this assumption will hold.

Message There is no such thing as an “assumption-free approach” first appeared on Open Forecast.

Structure vs. Noise: A Fundamental Concept in Forecasting

Ivan Svetunkov — Tue, 13 Aug 2024 13:06:01 +0000

One of the core ideas in statistics, which extends to many other fields including forecasting, is the concept of structure versus noise. You’ve probably heard of it, but it’s often overlooked by those without a strong quantitative background. So, let’s discuss.

The core of the idea is that any data consists of two fundamental parts:

Structure, which can take various forms, and might include trend, seasonality, calendar effects, and the influence of external factors on demand (e.g., price changes, promotions etc).
Noise, which is inherently unpredictable.

Structure can be captured using models or methods, and this is what produces the fitted values or point forecasts. Noise, on the other hand, is unpredictable – like not knowing exactly who will visit a store and when they’ll make a purchase.

For example, consider a local Lancaster pub that has a nice selection of beers. Their sales likely follow a pattern, such as higher sales on weekends or during special events like football matches. These patterns are the structure we can capture and forecast. However, the pub can’t anticipate when my friend Yves will visit me, and when we’ll go out for drinks. This element of uncertainty forms the noise – while it’s explainable from my perspective, it’s a mystery to the pub owner.

But as I said, the idea of structure vs. noise isn’t just relevant in demand forecasting; it applies in many other areas too. Take classification, for instance. When identifying mushrooms, you might not be able to tell for sure whether you’re looking at a Rosy Brittlegill or The Sickener without a microscope. While certain characteristics (like stem shape or cap colour) make up the structure, there’s always some randomness that can make one mushroom look like another. So, in classification, you can only say that it’s more likely that we have one type of mushroom rather than the other, and you need to consider the uncertainty around this choice (the modern approach to this is to use conformal prediction).

Furthermore, we as humans are very good at finding patterns in the noise. If you look at clouds and see a mushroom, it’s not a real mushroom, just a random arrangement of vapour. So when you work with the data, remember this feature and don’t fall into the trap of finding patterns that don’t actually exist. Be critical and avoid overfitting the noise.

As you can see, the concept of structure versus noise is fundamental and shows up in many contexts. In forecasting, our job all is to capture the structure somehow, filter out the noise so that we can then produce point forecasts (future structure) and prediction intervals (representing the size of uncertainty) to be able to make adequate decisions.

Message Structure vs. Noise: A Fundamental Concept in Forecasting first appeared on Open Forecast.

Complex-Valued Econometrics with Examples in R

Ivan Svetunkov — Sun, 04 Aug 2024 14:33:40 +0000

Back in 2022, my father asked me to help him in amending and editing a monograph he wrote on the topic of “Complex-Valued Econometrics”. The original book focused on dynamic models, but after looking through the material and a thorough discussion, we decided to write something more fundamental. The monograph is based on the research he has done over the years, working in Saint Petersburg. I developed an R package called “complex” to support the book and then expanded the text with some derivations and examples of application. The result was then submitted to Springer and is now finally published in their “Contributions to Economics” series. Unfortunately, due to the agreement with the publisher, we cannot make the book freely available, but some of related materials can be found on a github repo, here.

We will receive royalties from selling this book, and we have decided to direct them to a charity to help Ukrainians (this one).

And here is how the cover of the book looks like:

Complex-Valued Econometrics with Examples in R

Svetunkov S., Svetunkov I. (2024). Complex-Valued Econometrics with Examples in R: Modelling, Regression and Applications. Springer Cham. 154 pages. DOI: 10.1007/978-3-031-62608-1

Message Complex-Valued Econometrics with Examples in R first appeared on Open Forecast.

ISF2024: How to Bootstrap Time Series without Attracting Attention of Statisticians

Ivan Svetunkov — Wed, 03 Jul 2024 14:11:41 +0000

On 1st July, I presented my ongoing work on time series bootstrap and its impact on prediction intervals at ISF2024 in Dijon, France.

Abstract: Bootstrap is extensively used in statistics and machine learning for cross-sectional data to account for uncertainty about the data, model form, and parameter estimates. However, conventional methods may not be suitable for time series data due to autocorrelation and specific dynamic structures. Over the years, various approaches have been developed to address this issue. Some assume specific models (e.g., STL), while others are non-parametric (e.g., Maximum Entropy Bootstrap, MEB). However, the former can be overly restrictive, while the latter may not perform well in case of outliers and external drivers. To address these issues, we propose a non-parametric bootstrap approach inspired by MEB, which does not assume any structure in the data yet creates reasonable copies of existing time series of different nature. These copies can be utilised in bagged ETS/ARIMA or any other approach involving small sample uncertainty. We demonstrate how the proposed bootstrap works using real-time series examples and assess improvements it brings in terms of forecasting accuracy compared to conventional approaches.

Here are the slides of the presentation.

And here is me, trying not to attract attention of statisticians:

How to attract attention of statisticians…

Message ISF2024: How to Bootstrap Time Series without Attracting Attention of Statisticians first appeared on Open Forecast.

Statistical tests flowchart

Ivan Svetunkov — Tue, 30 Nov 2021 15:29:12 +0000

In Lancaster University, I teach the module called “Statistics and Descriptive Analytics”, which is compulsory for master students of the programme “Business Analytics“. This year, the module has been delivered by Alisa Yusupova and me, and I have prepared a flowchart that should (hopefully) help students decide, which of the statistical tests to use in different situations. This is in no way a full diagram, it mainly focuses on tests for the mean / median and variance, but it should be useful for the students and those who use statistical tests in their work. Here it is:

Start your journey from the green rhombus

And here is the same thing in pdf

Message Statistical tests flowchart first appeared on Open Forecast.