Archives Theory of forecasting - Open Forecasting

Hans Levenbach’s classification scheme for trend/seasonal components

Ivan Svetunkov — Mon, 18 May 2026 08:01:28 +0000

Here is a curious idea: if we can somehow estimate the importance of trend/seasonal components for your data, you can use this in model building and forecasting. But how can we do this first step? Hans Levenbach has an answer with his simple EDA technique. Let me explain.

The core idea is simple and neat. For this example, I’ll use monthly data, like the time series in this image:

Series N2568 from the M3 dataset

You can see that the data has strong seasonality, and we can qualitatively say that capturing that seasonal component correctly will probably solve the main problem in capturing the structure. But how can we quantify this?

All you need to do is put the data in a “wide” format, with months in rows and years in columns. Then, as Hans proposed, run a two-way ANOVA with “month” and “year” to capture variability due to year (trend) and due to month (seasonality). Roughly, we take row/column means to get mean seasonal profiles and mean annual changes (trend), as in the following two images:

Seasonal profile of the data

Trend profile

The former has no trend, the latter has no seasonality, so they can be analysed separately. Then we calculate the sums of squares of these means from the global mean to estimate variation due to months (seasonality) and years (trend). We can also calculate the sum of squares of the irregular component (what is left), giving three elements that add up to the total sum of squares.

Next step is trivial and straightforward: calculate the shares of each component in the total sum of squares. For our example, using aov() in R and then computing the total:

Seasonal:  292,307,558
Trend:     176,308,365
Irregular:  33,618,630

Total:     502,234,552

So, the seasonal contribution is 292,307,558 / 502,234,552 ≈ 58.2%, the trend contribution is 35.1%, and the irregular component is 6.69%.

Why bother? This simple EDA technique tells you roughly what to focus in forecasting. In this example, capturing seasonality correctly is roughly 60% of the story, with trend being second in importance. Hans goes further in his derivations, see his LinkedIn post. He also analysed M3 results at some point, explaining why some methods performed better (trend dominated the data).

It is worth pointing out that this approach assumes that the seasonal component does not evolve over time, which is reasonable but not always correct. And the model behind this is essentially a regression with dummy variables for year and month. Nonetheless, it is a great starting point for EDA.

P.S. Hans Levenbach passed away on 7 April 2026. I wasn’t sure whether to write about it and what to write about him, but I had several nice discussions with him, and I have admired his approach to forecasting: first explore the data, then build a model. His passing is a loss for the forecasting community.

P.P.S. You can read a bit about him on the IIF website.

CMAF had a webinar with Hans a couple of years ago. We had technical issues, but he managed to explain his idea well.

Message Hans Levenbach’s classification scheme for trend/seasonal components first appeared on Open Forecasting.

There’s no such thing as “deterministic forecast”

Ivan Svetunkov — Mon, 02 Mar 2026 22:45:31 +0000

Sometimes I see people referring to a “deterministic” forecast, and I have some personal issues with this. Because if you apply a model to data then there is nothing deterministic about your forecasts!

In many contexts, “deterministic” has a precise meaning: no randomness, no uncertainty. A deterministic solution to an optimisation problem (e.g. linear programming) implies that there are no random inputs or outputs once the model and its parameters are fixed. Forecasting is different. As Chatfield and many others have pointed out, forecasting has multiple sources of uncertainty, and there is essentially zero chance that the future will unfold exactly as any single number suggests.

Yes, some people use “deterministic” as a synonym for “point forecast”. But that label is still misleading, because a point forecast is not uncertainty-free – it is just one summary of a predictive distribution (often the conditional mean, sometimes the median or another functional).

Here’s a quick reality check you can do yourself. Take a dataset, apply your model, and write down the point forecast for the next few observations. Now add one new observation, re-estimate, and forecast again (the image in this post depicts exactly that, but with 50 forecasts produced on different subsamples of data). The point forecast will change unless you are dealing with an exotic situation with non-random data (e.g. every day, you sell exactly 100 units). So, which of the two was the “deterministic” forecast? If forecasts were truly deterministic in the strict sense, you would not get multiple plausible values from small, reasonable changes in the sample.

This happens because any forecasting method (statistical or ML) depends on data and on modelling choices: parameter estimation, feature selection, splitting rules, tuning, even decisions like “use α=0.1”. Those choices can be fixed across samples of data, but fixing them does not remove uncertainty – it only hides it. The randomness is still there in the data and in the fact that we only observe a sample of it.

So when you see someone mentioning “deterministic forecast”, it’s worth translating it mentally to: “a point forecast, probably a conditional mean”. If you care about decisions and risk, you should know that there is an uncertainty associated with this so called “deterministic forecast”, and that it should not be ignored. But this is a topic for another discussion in another post.

Message There’s no such thing as “deterministic forecast” first appeared on Open Forecasting.

Scaling of error measures

Ivan Svetunkov — Mon, 23 Feb 2026 13:36:12 +0000

Apparently, we need to talk about scaling of error measures because this is not as obvious as it seems.

In forecasting literature, since early days of the area, there has been a general consensus that the forecast errors from the individual time series should not be analysed and aggregated as is. This is because you can have very different time series capturing dynamics of very different processes.

Indeed, if you forecast sales of apples in kilograms, your actual value would be apples in kilograms, and your point forecast would also be in the same units. Subtracting one from another tells us how many kilograms of apples we missed with the forecast we produced. But if we then take the average between forecast errors for apples and beer, we would be aggregating things in different units, which contradicts some basic aggregating principles.

Furthermore, if the company sells thousands of kilograms of apples and jet engines, aggregating forecast errors on those (e.g. 3000 vs 3) might introduce all types of issues, because the models performance on apples might mask the performance of the model on jet engines. Still, the jet engines are much more expensive than apples and getting them accurately might be more important for the company than forecasting apples.

So, forecasting literature has agreed that the forecast errors need to be somehow scaled to make the errors unitless and not to distort performance of models on time series with different volumes. There are several ways of doing that, including the poor ones and reasonable ones. The state of the art at the moment is to divide error measures by some in-sample statistics to avoid potential holdout-sample distortion. Using mean absolute differences (MAD) for this (thus ending up with MASE or RMSSE) is considered as a standard. A couple of years ago, I have written a post about advantages and disadvantages of several scaling methods.

But there is one method that I haven’t looked at and which is not very well discussed in the forecasting literature. It relies on the monetary value of forecasts. We could multiply each individual forecast error “e” by the price of the product “p” (thus moving to the missed income per product) and then divide everything by the overall income (price times quantity) from different products. This can be written as:

\begin{equation}
\text{monetary Mean Error} = \frac{\sum_{j=1}^n (p_j \times e_j)} {\sum_{j=1}^n (p_j \times q_j)}
\end{equation}

(the above formula can be modified to have squares or absolute values of the error). This way we switch from the original units to the monetary values and each error would tell you the percentage of the missed income in the overall one. This is a useful measure because it connects models performance with some managerial decisions and it takes the value of product into account (thus we do not mask the expensive jet engines with cheap apples).

However, it might have a potential issue similar to what the MAE/Mean or wMAPE has: if the sales of the product are not stationary, the denominator would change, thus driving the proportion either up or down, irrespective of how good the forecast is. I am not sure whether this needs to be addressed, because there is an argument that if the income from a product has increased and the error hasn’t changed, then this means that the proportion of the missed income decreased, which makes sense. But if we need to address this, we can switch to the MAD multiplied by price in the denominator to address this issue. In fact, this was sort of done in M5 competition that used a weighted RMSSE, relying on the income from each product over the last 4 weeks of data.

But here is one more interesting thing about this error measure. If we assume that prices for all products are exactly the same, they will disappear from the numerator and the denominator, leaving us with just sum of errors divided by the overall sales of all products. This still maintains the original idea of the proportion of the missed income, but now has a very strong assumption, which is probably not correct in the real life (apples and engines for the same price?). Furthermore, this would mask the performance of the model for the expensive products again. I personally don’t like this measure and find the assumption unrealistic and potentially misleading. Having said that, I can see some cases where this could still be acceptable and useful (e.g. similar products with similar dynamics and similar prices).

Summarising:

If you are conducting a forecasting experiment without a specific context, I’d recommend using RMSSE or some other similar measure with scaling.
If you have prices of products, income-based scaling might be more informative.
Setting all prices to the same value does not sound appealing to me, but I understand that there is a context where this might work.

Message Scaling of error measures first appeared on Open Forecasting.

Risky business: how to select your model based on risk preferences

Ivan Svetunkov — Mon, 19 Jan 2026 11:28:04 +0000

What do you use for model selection? Do you select the best model based on its cross-validated performance, or do you use in-sample measures like AIC? If so, there is a way to improve your selection process further.

JORS recently published the paper of Nikos Kourentzes and I based on a simple but powerful idea: instead of using summary statistics (like the mean RMSE of cross-validated errors), you should consider the entire distribution and choose a specific quantile. This aligns with my previous post on error measures, but here is the core intuition:

The distribution of error measures is almost always asymmetric. If you only look at the average, you end up with a “mean temperature in the hospital” statistic, which doesn’t reflect how models actually behave. Some models perform great on most series but fail miserably on a few.

What can we do in this case? We can look at quantiles of distribution.

For example, if we use 84th quantile, we compare the models based on their “bad” performance, situations where they fail and produce less accurate forecasts. If you choose the best performing model there, you will end up with something that does not fail as much. So your preferences for the model become risk-averse in this situation.

If you focus on the lower quantile (e.g. 16th), you are looking at models that do well on the well-behaved series and ignore how they do on the difficult ones. So, your model selection preferences can be described as risk-tolerant, because you are accept that the best performing model might fail on a difficult time series.

Furthermore, the median (50th quantile, the middle of sample), corresponds to the risk-neutral situation, because it ignores the tails of the distribution.

What about the mean? This is a risk-agnostic strategy, because it says nothing about the performance on the difficult or easy time series – it takes everything and nothing in it at the same time, hiding the true risk profile.

So what?

In the paper, we show that using a risk-averse strategy tends to improve overall forecasting accuracy in day-to-day situations. Conversely, a risk-tolerant strategy can be beneficial when disruptions are anticipated, as standard models are likely to fail anyway.

So, next time you select a model, think about the measure you are using. If it’s just the mean RMSE, keep in mind that you might be ignoring the inherent risks of that selection.

P.S. While the discussion above applies to the distribution of error measures, our paper specifically focused on point AIC (in-sample performance). But it is a distance measure as well, so the logic explained above holds.

P.P.S. Nikos wrote a post about this paper here.

P.P.P.S. And here is the link to the paper.

Message Risky business: how to select your model based on risk preferences first appeared on Open Forecasting.

Several crucial steps in demand forecasting with ML

Ivan Svetunkov — Wed, 08 Oct 2025 10:32:41 +0000

When I read posts written by some ML experts, I sometimes notice that they either overlook or do not clearly explain a few crucial steps in demand forecasting. In this post, I want to highlight the three most important ones based on my personal experience.

First and foremost, stationarity (see the proper definition of the term here). Time series often exhibit either a changing level or clear trends. While some models (such as ETS) take care of that directly via specific components, others need some preliminary steps before being applied. The simplest thing one can do is to take differences of the original data if it shows any form of non-stationarity and model the rates of change instead of just demand. For ML, this is extremely important because typical approaches (such as decision trees, k-NN, neural networks) cannot extrapolate. So, getting rid of the trend and/or ensuring that the level does not change over time will help ML approaches do the job they are supposed to do. And don’t use the global trend (see image in the post), as time series rarely exhibit a constant increase or decrease. In real life, the trend is usually stochastic, implying that the average sales change at a varying rate, not a fixed one.

Second, real time series often exhibit heteroscedasticity (i.e. the variance of the data increases with the level). The simplest way to stabilise the variance is to take logarithms of the data. This is not a universal solution, but it works in many cases. This way, the error in the model should have a constant variance. A more advanced approach is to use the Box-Cox transformation, but this requires estimating the parameter lambda, which is not always straightforward. The main issue with this arises when working with intermittent demand, where some unpredictable zeroes occur (the logarithm of zero equals -infinity). In that case, you might want to take logarithms of demand sizes (non-zero values) instead of the demand itself and switch to a mixture model. Another simple but inelegant trick is to add one to every observation and then take logarithms. This works but breaks my heart.

Third, seasonality is extremely important in time series. There are many ways one can capture it. The simplest is by introducing dummy variables, but this might cause issues because, in reality, seasonality often changes over time (see this recent post). So, a better way of capturing it is either by extracting the seasonal component from STL or ETS and using it as a feature or by using the lagged values of your data. Depending on the specific situation and approach, there can be many other ways of capturing seasonality, and frankly, I struggle to come up with a universal one.

Bonus: if you work with intermittent demand, splitting it into demand sizes and demand occurrence might boost the accuracy of your approach if the underlying levels are captured correctly. Anna Sroginis and I show this improvement for LightGBM, regression, and simple local level approaches in our paper (currently under revision).

P.S. Kandrika and I will deliver a course on “Demand Forecasting Principles with examples in R” in November, where we will discuss these and other important aspects of demand forecasting. You can still sign up for it here

Message Several crucial steps in demand forecasting with ML first appeared on Open Forecasting.

Evolving seasonality

Ivan Svetunkov — Mon, 22 Sep 2025 19:21:09 +0000

Here is another fascinating aspect of the seasonal profile in your data: it can evolve over time due to changing consumer preferences. How so? Let me explain.

I’ve worked with a couple of companies where there were some examples of data with drastically changing seasonal patterns over just a few years. For example, Before Covid (BC), the consumer behaviour could be different than After the Disruption (AD): people started ordering online some products that they used to buy in shops. Some practitioners told me that the seasonal patterns in their data had changed so dramatically that historical data BC had become practically useless.

Why is this important?

If you don’t recognise that the seasonal patterns can change and simply use seasonal dummy variables (for example, in regression or decision trees), you’ll run into problems, as these approaches won’t capture the evolving profile. The same applies to classical decomposition (see Section 3.2 of ADAM), since it assumes a fixed seasonal structure. In fact, any model that assumes fixed seasonality would fail in this situation, and you may not even notice (see image in the post, where the model fails to capture the profile correctly because it assumes that it is fixed).

What can we do with that?

The solution is to use approaches that allow seasonality to evolve over time. ARIMA and ETS handle this via their parameters, while STL decomposition produces a dynamic seasonal profile. In regression or decision trees, you could incorporate lagged sales to partially account for this, or bring in the seasonal component from STL/ETS as an additional feature.

All good? Not quite, because there is a nasty small grey elephant hidden in this room.

Even if your chosen method allows seasonality to evolve, you must ensure that forecasting uncertainty reflects this properly. If the seasonal pattern in your training set changed drastically, what prevents it from shifting again in the test set? This is particularly critical if you need predictive distributions, such as for setting safety stock levels or generating prediction intervals. Ignoring the fact that the seasonality might change further could make your predictive distribution narrower than expected, leading to potential lost sales.

The good news: ARIMA and ETS handle this naturally, as components’ uncertainty translates directly to the holdout variance. In ML, it is more complicated, because you would need to invest time in proper feature engineering to explicitly capture the potential seasonality changes. Unfortunately, I haven’t done much in the latter direction, so I cannot give you a good recipe. Any thoughts what to do here?

And what do you do in the situation like that?

Message Evolving seasonality first appeared on Open Forecasting.

Review of a paper on comparison of modern machine learning techniques in retail

Ivan Svetunkov — Sun, 22 Jun 2025 21:59:19 +0000

A couple of days ago, I noticed a link to the following paper in a post by Jack Rodenberg: https://arxiv.org/abs/2506.05941v1. The topic seemed interesting and relevant to my work, so I read it, only to find that the paper contains several serious flaws that compromise its findings. Let me explain.

Introduction

But first, why am I writing this post?

There’s growing interest in forecasting among data scientists, data engineers, ML experts etc. Many of them assume that they can apply their existing knowledge directly to this new area without reading domain-specific literature. As a result, we get a lot of “hit-or-miss” work: sometimes having promising ideas, but executed in ways that violate basic forecasting principles. The main problem with that is that if your experiment is not done correctly, your results might be compromised, i.e. your claims might be simply wrong.

If you’re a researcher writing forecasting-related papers, then hopefully reading this post (and the posts and papers I refer to), will help you improve your papers. This might lead to a smoother peer-review process. Also, while I can’t speak for other reviewers, if I come across a paper with similar issues, I typically give it a hard time.

I should also say that I am not a reviewer of this paper (I would not publish a review), but I merely decided to demonstrate what issues I can see when I read papers like that. The authors are just unlucky that I picked their paper…

Let’s start.

The authors apply several ML methods to retail data, compare their forecasting accuracy, and conclude that XGBoost and LightGBM outperform N-BEATS, NHITS, and Temporal Fusion Transformer. While the finding isn’t groundbreaking, additional evidence on a new dataset is always welcome.

Major issues

So, what’s wrong? Here is a list of the major comments:

Forecast horizon vs. data frequency:

Daily data with a 365-day forecast horizon makes no practical sense (page 2, paragraph 3). I haven’t seen any company making daily-level decisions a year in advance. Stock decisions are typically made on much shorter horizons, and if you need a year ahead forecast, you definitely do not need it on the daily level. After all, there is no point in knowing that on 22nd December 2025 you will have the expected demand of 35.457 units – it is too far into the future to make any difference. Some references:

Athanasopoulos and Kourentzes (2023) paper discusses data frequency and some decisions related to them;
and there is a post on my website on a related topic

Misuse of SBC classification:

Claiming that 70% of products are “intermittent” (page 2, last paragraph) based on SBC is incorrect. Furthermore, SBC classification does not make sense in this setting, and is not used in the paper anyway, so the authors should just drop it.

Read more about it here.
And there is a post of Stephan Kolassa on exactly this point

Product elimination and introduction is unclear (page 3):

The authors say “Around 30% of products were eliminated during training and 10% are newly introduced in validation”. It’s not clear why this was done and how specifically. This needs to be explained in more detail.

“Missing values” undefined:

It is not clear what the authors mean by “missing values” (page 3, “Handling Missing Values”). How do they appear and why? Are they the same as stockouts, or were there some other issues in the data? This needs to be explained in more detail.

Figure 1 is vague:

Figure 1 is supposed to explain how the missing values were treated. But the whole imputation process is questionable, because it is not clear how well it worked in comparison with alternatives and how reasonable it is to have an imputed series that look more erratic than the original one. The discussion of that needs to be expanded with some insights from the business problem.

No stockout handling discussion:

The authors do not discuss whether the data has stockouts or not. This becomes especially important in retail, because if the stockouts are not treated correctly, you would end up forecasting sales instead of demand

For example, see this post.

Feature engineering is opaque:

“Lag and rolling-window statistics for sales and promotional indicators were created” (page 3, “Feature Engineering”) – it is not clear, what specific lags, what length of rolling windows, and what statistics (anything besides mean?) were created. These need to be explained for transparency and so that a reader could better understand what specifically was done. Without this explanation, it is not clear whether the features are sensible at all.

Training/validation setup not explained:

It is not clear how specifically the split into training and validation sets was done (page 3, last paragraph), and whether the authors used rolling origin (aka time series cross-validation). If they did random splits, that could cause some issues, because the first law of time series is not to break its structure!

Variables transformation is unclear:

It is not clear whether any transformations of the response variable were done. For example, if the data is not stationary, taking differences might be necessary to capture the trend and to do extrapolation correctly. Normalisation of variables is also important for neural networks, unless this is built-in in the functions the authors used. This is not discussed in the paper.

Forecast strategy not explained:

It is not clear whether the direct or recursive strategy was used for forecasting. If lags were not used in the model, that would not matter, but they are, so this becomes a potential issue. Also, if the authors used the lag of the actual value on observation 235 steps ahead to produce forecast for 236 steps ahead, then this is another fundamental issue, because that implies that the forecast horizon is just 1 step ahead, and not 365, as the authors claim. This needs to be explained in more detail.

I’ve written a post about the strategies.

No statistical benchmarks:

At the very least, the authors should use simple moving average and probably exponential smoothing. Even if they do not perform well, this gives an additional information about the performance of the other approaches. Without them, the claims about good performance of the used ML approaches are not supported by evidence. The authors claim that they used mean as a benchmark, but its performance is not discussed in the paper.

Issues with forecast evaluation:

The whole Table 3 with error measures is an example of what not to do. Here are some of major issues:

There is no point in reporting several error measures – each one of them is minimised by their own statistics. The error measure should align with what approaches produce.
MSE, RMSE, MAE and ME should be dropped, because they are not scaled, so the authors are adding up error measures for bricks and nails. The result is meaningless.
MASE is not needed – it is minimised by median, which could be a serious issue on intermittent demand see this post. wMAPE has similar issues because it is also based on MAE.
If the point forecasts are produced in terms of medians (like in case of NBEATS), then RMSSE should be dropped, and MASE should be used instead.
But also, comparing means with medians is not a good idea. If you assume a symmetric distribution, the two should coincide, but in general this might not hold.
R2 is not a good measure of forecast accuracy. It makes some sense in regression context for linear models, but in this one, it is pointless, and only shows that the authors don’t fully understand what they are doing. Plus, it’s not clear how specifically it was calculated.
I don’t fully understand “demand error”, “demand bias” and other measures, and the authors do not explain them in necessary detail. This needs to be added to the paper.
The split into “Individual Groups” and “Whole Category” is not well explained either: it is not clear what this means, why, and how this was done.
And in general, I don’t understand what the authors want to do with Cases A – D in Table 3. It is not clear why they are needed, and what they want to show with them. This is not explained in the paper.

I have a series of posts on forecast evaluation here.

Invalid analysis of bias measures:

Analysis of bias measures is meaningless because they were not scaled.

Disturbing bias of NBEATS in Figure 2:

The bias shown in Figure 2 is disturbing and should be dealt with prior to evaluation. It could have appeared due to the loss function used for training or because the data was not pre-processed correctly. Leaving it as is and blaming NBEATS for this does not sound reasonable to me.

No inventory implications:

The authors mention inventory management, but stop on forecasting, not showing how the specific forecasts translate to inventory decisions. If this paper was to be submitted to any operations-related journal, the inventory implications would need to be added in the discussion.

Underexplained performance gaps:

The paper also does not explain well why neural networks performed worse than gradient boosting methods. They mention that this could be due to the effect of missing values, but this is a speculation rather than an explanation, which I personally do not believe (I might be wrong). While the overall results make sense for me personally, if you want to publish a good paper, you need to provide a more detailed answer to the question “why?”.

Minor issues

I also have three minor comments:

“many product series are censored” (page 2, last paragraph) is not what it sounds like. The authors imply that the histories are short, while the usual interpretation is that the sales are lower than the demand, so the values are censored. I would rewrite this.
Figure 2 has the legend saying “Poisson” three times, not providing any useful information. This is probably just a mistake, which can easily be fixed.
There are no references to Table 2 and Figure 3 in the paper. It is not clear why they are needed. Every table and figure should be referred to and explained.

Conclusions

Overall, the paper has a sensible idea, but I feel that the authors need to learn more about forecasting principles and that they have not read forecasting literature carefully to understand how specifically the experiments should be designed, what to do, and not to do (stop using SBC!). Because they made several serious mistakes, I feel that the results of the paper are compromised and might not be correct.

P.S. If I were a reviewer of this paper, I would recommend either “reject and resubmit” or a “major revision” (if the former option was not available).

P.P.S. If the authors of the paper are reading this, I hope you find these comments useful. If you have not submitted the paper yet, I’d suggest to take some of them (if not all) into account. Hopefully, this will smooth the submission process for you.

Message Review of a paper on comparison of modern machine learning techniques in retail first appeared on Open Forecasting.

SBC is not for you!

Ivan Svetunkov — Wed, 04 Jun 2025 11:41:00 +0000

I’ve been acting as a reviewer lately, providing comments on papers about intermittent demand, and I’ve felt a bit frustrated by what some authors write. Let me explain.

Several papers I reviewed claim that demand can be either intermittent or lumpy. They then mention the Syntetos-Boylan-Croston (SBC) classification and use the thresholds from Syntetos et al. (2005: ) to do some things with ML methods. Sounds reasonable?

No! And here’s why.

Actually, I’ve already explained this in a previous post, but let me summarise the main points again.

First, intermittent demand is the demand that happens at irregular frequency. That’s the definition John Boylan and I came up with in our paper (this one). But even before that, the literature generally agreed: if you observe naturally occurring zeroes (e.g., no one wants to buy a product), then the demand is intermittent – even if there’s only one zero in the data.

Now, Syntetos et al. (2005) specifically studied intermittent demand and proposed a classification to help choose between Croston’s method and SBA. Their classification includes four types (see image in the post):

Erratic but not very intermittent
Smooth
Lumpy
Intermittent but not very erratic

The thresholds they used (ADI=1.32 and CV²=0.49) were only intended to guide the choice between Croston and SBA. And “lumpy”, as you can see, is just a special case of intermittent demand!

Yes, you can classify intermittent demand into “lumpy” and “smooth”, but this separation is not well-defined. Use a different classification (e.g., this paper) and you’ll get different results. In fact, practically speaking, your ML approach likely doesn’t need this classification at all.

So, here are a two things you should NOT DO:

Saying that demand can be “intermittent” or “lumpy” – the latter is a subset of the former.
Use ADI=1.32 and/or CV²=0.49 to categorise demand, unless you’re selecting between Croston and SBA. And let’s be honest, you’re probably not doing that. So forget about it!

And honestly, stop overusing SBC! Lately, I’ve seen more harm than good from it. If you really want to use it, make sure you’ve read carefully and understood the original paper.

But if you don’t know what you are doing, SBC is not for you!

Message SBC is not for you! first appeared on Open Forecasting.

On randomness and uncertainty

Ivan Svetunkov — Mon, 28 Apr 2025 11:05:29 +0000

Everything is random! Your data, your model, its parameter estimates, the forecasts it produces, and even the minimum of the loss function you used. There is no such thing as a “deterministic” forecast – everything is stochastic!

Whenever you work with data, you are working with a sample from a population. In some cases, this is more apparent than in others. In my statistics lectures, I typically give the following example. Consider that we are interested in the average height of students at the university. I could ask every student at the lecture to tell me their height, take the average, and get a number. Is this number random? Yes, indeed. Why? Because if a student who was late for the lecture comes in, I would need to recalculate the average, and the number would change. The average that I get depends on who specifically I have in the sample and how many observations I have. It will vary more in smaller samples and become more stable in larger ones. But this example gives you an idea about the inherent uncertainty of any estimates we deal with.

In time series, the situation is somewhat similar: you are dealing with a sample of values that you have observed up until a specific moment. If, for example, you want to forecast daily admissions in the emergency department of a hospital and apply a model, its forecast will change when a new day comes and a new cohort of patients arrives. This is because your sample changes, and you receive new information about the demand.

So, the parameter estimates of a model you use will change when you get a new observation (e.g., a new record of product sales). Yes, if you estimate the model properly (e.g., using Least Squares), the parameter estimates won’t change substantially, but they will change nonetheless. And this would affect point forecasts and any other statistics produced by your model. Your standard errors, p-values, conditional means, prediction intervals, error measures, model ranking – everything will change with a new observation. In fact, if you do model selection, the structure of the model might change as well. For example, in the case of ETS, you might switch from a model without a trend to one with a trend. So, every time you estimate anything on a sample of data, you should keep in mind that it is random and will change if your sample changes or gets updated.

Why is that important? Because we need to understand this inherent uncertainty, and ideally, we should somehow take it into account. In forecasting, this means you should not draw conclusions based on one application of a model to a dataset. At the very least, you should perform a rolling origin evaluation. As Leonidas Tsaprounis says, “if you don’t roll the origin, you roll the dice”.

So, embrace the uncertainty and learn how to deal with it.

By the way, Kandrika Pritularga and I are holding a course on Demand Forecasting starting on 6th May. There is still time to sign up for it here.

Message On randomness and uncertainty first appeared on Open Forecasting.

Challenges related to seasonal data: shifting seasonality

Ivan Svetunkov — Mon, 07 Apr 2025 12:54:49 +0000

There are many different issues with capturing seasonality in time series. In this short post, I’d like to discuss one of the most annoying ones.

I’m talking about the seasonal pattern that shifts over time. What I mean is that, for example, instead of having the standard number of observations in the cycle (e.g., 24 hours in a day), in some cases you can have more or fewer of them. How is that possible?

One of these issues is the Daylight Saving Time (DST) change. The original idea of DST was to reduce energy consumption because daylight in summer is longer than in winter (there’s a nice and long article on Wikipedia about it). Because of this, many countries introduced a time shift: in spring, the clock is moved forward by one hour, while in autumn it goes back. This idea had a reasonable motivation at the beginning of the 20th century, but I personally think that as we’ve progressed as a society, it has lost its value. While this is already extremely annoying on its own, a bit unhealthy (several studies report an increased risk of heart attacks), and a torture for parents with small kids (the little ones don’t understand that it’s not 7am yet), it also introduces a modelling challenge: two days in the year do not have 24 hours. In spring, we have 23 hours, while in autumn we have 25. Standard classical forecasting approaches (such as ETS/ARIMA, regression, STL or classical decomposition) break in this case, because by default they assume that a specific pattern repeats itself every 24 hours. The issue arises because business cycles are tuned to working hours, not to the movement of the sun – people come to work at 9am, no matter how many hours are in the day.

Another challenge is leap years. While DST is totally man-made, leap years occur because the Earth orbits the sun approximately every 365.25 days. To avoid drifting too far from reality, our calendars include one extra day every four years (29th February). This addresses the issue but also means that one year has 366 days instead of 365. Once again, conventional models relying on fixed periodicity fail.

There are several ways to handle this, all with their own advantages and disadvantages:

Fix the data. In the case of DST, this means removing one of the duplicated hours during the autumn time change and adding one during the spring shift. For leap years, it means dropping the 29th of February. This is easy to do, but breaks the structure and might cause issues when we have DST/leap year in the holdout sample.
Introduce more complex components, such as Fourier-based ones, to capture the shift in the data. This works well for leap years but doesn’t address the DST issue. Harmonic regressions and TBATS do this, for example.
Shift seasonal indices when the issue happens – for example, having two indices for 1am when the switch to winter time occurs.

In R, I’ve developed the temporaldummy() function in the greybox package to introduce correct dummy variables for data with shifting seasonality, and I’ve incorporated method (3) into the adam() function from the smooth package. You can read more about these here: https://openforecast.org/adam/MultipleFrequenciesDSTandLeap.html

Are there any other strategies? Which one do you prefer?

BTW, Kandrika Pritularga and I are running a course on Demand Forecasting Principles with Examples in R. We’ll discuss some of these aspects there. Read more about it here.

Message Challenges related to seasonal data: shifting seasonality first appeared on Open Forecasting.