Archives Stories - Open Forecast

Staying Positive: Challenges and Solutions in Using Pure Multiplicative ETS Models

Ivan Svetunkov — Wed, 10 Jan 2024 13:34:51 +0000

Authors: Ivan Svetunkov, John E. Boylan

Journal: IMA Journal of Management Mathematics

Abstract: Exponential smoothing in state space form (ETS) is a popular forecasting technique, widely used in research and practice. While the additive error ETS models have been well studied, the multiplicative error ones have received much less attention in forecasting literature. Still, these models can be useful in cases, when one deals with positive data, because they are supposed to work in such situations. Unfortunately, the classical assumption of normality for the error term might break this property and lead to non-positive forecasts on positive data. In order to address this issue we propose using Log-Normal, Gamma and Inverse Gaussian distributions, which are defined for positive values only. We demonstrate what happens with ETS(M,*,*) models in this case, discuss conditional moments of ETS with these distribution and show that they are more natural for the models than the Normal one. We conduct the simulation experiments in order to study the bias introduced by point forecasts in these models and then compare the models with different distributions. We finish the paper with an example of application, showing how pure multiplicative ETS with a positive distribution works.

DOI: 10.1093/imaman/dpad028.

Working paper.

About the paper

DISCLAIMER: This is quite a technical paper focusing on solving a small problem of the ETS model that would allow using it in specific non-standard situations. It acts as a building block for the iETS paper. But the latter does not work without this paper, so while it seems small, it is an important brick in the wall.

The conventional ETS works great for regular demand, where the volume of the data is high. In that case, a forecaster can decide which of the 30 models to select for the data, not worrying too much about the assumption of normality for the error term and about forecast trajectories from the selected model. The situation changes when one needs to work with the positive low volume data. One would think that pure multiplicative ETS should work fine in that case, however, due to the normality assumption, the model might produce negative prediction intervals and in some situations even point forecasts. Trying to fix this issue, we considered several distributions for the error term in the multiplicative error ETS:

\( 1 + \epsilon_t \sim \mathcal{N}\left(1, \sigma^2\right) \) – the conventional assumption of Normality;
\( 1 + \epsilon_t \sim \mathcal{IG}\left(1, \sigma^2\right) \) – the error term follows the Inverse Gaussian distribution with the expectation of one and the variance of \(\sigma^2\);
\( 1 + \epsilon_t \sim \mathrm{log}\mathcal{N} \left(-\frac{\sigma^2}{2}, \sigma^2 \right) \) – the error term follows the Log-Normal distribution with the location of \(-\frac{\sigma^2}{2}\) and the scale of \( \sigma^2 \);
\( 1 + \epsilon_t \sim \Gamma\left(\sigma^{-2}, \sigma^2\right) \) – the error term follows the Gamma distribution with the shape parameter \(\sigma^{-2}\) and the scale \( \sigma^2 \).

The restrictions imposed on the parameters of distributions above are necessary to make sure that the expectation of the error term \(1 + \epsilon_t \) is zero. If it isn’t then the ETS model would need to be modified to cater for the non-zero mean, otherwise the model will produce incorrect forecasts.

In the paper, we show how ETS works with these assumptions, what forecasting trajectories it produces and how it can be estimated. We also demonstrate that the distribution selection can be easily automated using AIC. All these aspects of the model are already implemented and supported in the adam() function from the smooth package in R (read more here and here).

Story of the paper

John Boylan and I started working on this paper after getting a rejection from the IJF for the other paper of ours, “iETS: State space model for intermittent demand forecasting“. The rejection showed us that we need to take a completely new look at the paper, and it became apparent that the pure multiplicative ETS is not well studied in the literature. At the same time, its discussion would be outside of the scope of the original paper, so, we decided to write a separate one, focusing on the non-intermittent, but low volume demand.

We needed to discuss two points in the paper, which were then used in the iETS one:

The conventional ETS assumes that the demand follows Normal distribution. In case of low volume demand this assumption may lead to negative forecasts, which makes the model inappropriate;
Point forecasts from multiplicative ETS models does not coincide with the conditional expectations. Hyndman et al. (2008) discuss this in their book, but not to the extent we needed. We thought that lots of people do not understand the implications of this, so we added that discussion to the paper.

The paper was written over the period of 2021 – 2023 and was ready in Spring 2023. John and I discussed it several times, and we agreed to have a final look at it in May 2023 before submitting it to IMA Journal of Management Mathematics. When I found out that John was ill, I decided not to wait for his comments further and just submitted it. The paper went through a couple of rounds, changed its name to reflect concerns of one of reviewers (the new name is objectively better than the old one) and was accepted for publication in November 2023. This is the last paper that John and I wrote together.

Message Staying Positive: Challenges and Solutions in Using Pure Multiplicative ETS Models first appeared on Open Forecast.

Story of “Probabilistic forecasting of hourly emergency department arrivals”

Ivan Svetunkov — Wed, 10 May 2023 20:47:27 +0000

The paper

Back in 2020, when we were all siting in the COVID lockdown, I had a call with Bahman Rostami-Tabar to discuss one of our projects. He told me that he had an hourly data of an Emergency Department from a hospital in Wales, and suggested writing a paper for a healthcare audience to show them how forecasting can be done properly in this setting. I noted that we did not have experience in working with high frequency data, and it would be good to have someone with relevant expertise. I knew a guy who worked in energy forecasting, Jethro Browell (we are mates in the IIF UK Chapter), so we had a chat between the three of us and formed a team to figure out better ways for ED arrival demand forecasting.

We agreed that each one of us will try their own models. Bahman wanted to try TBATS, Prophet and models from the fasster package in R (spoiler: the latter ones produced very poor forecasts on our data, so we removed them from the paper). Jethro had a pool of GAMLSS models with different distributions, including Poisson and truncated Normal. He also tried a Gradient Boosting Machine (GBM). I decided to test ETS, Poisson Regression and ADAM. We agreed that we will measure performance of models not only in terms of point forecasts (using RMSE), but also in terms of quantiles (pinball and quantile bias) and computational time. It took us a year to do all the experiments and another one to find a journal that would not desk-reject our paper because the editor thought that it was not relevant (even though they have published similar papers in the past). It was rejected from Annals of Emergency Medicine, Emergency Medicine Journal, American Journal of Emergency Medicine and Journal of Medical Systems. In the end, we submitted to Health Systems, and after a short revision the paper got accepted. So, there is a happy end in this story.

In the paper itself, we found that overall, in terms of quantile bias (calibration of models), GAMLSS with truncated Normal distribution and ADAM performed better than the other approaches, with the former also doing well in terms of pinball loss and the latter doing well in terms of point forecasts (RMSE). Note that the count data models did worse than the continuous ones, although one would expect Poisson distribution to be appropriate for the ED arrivals.

I don’t want to explain the paper and its findings in detail in this post, but given my relation to ADAM, I have decided to briefly explain what I included in the model and how it was used. After all, this is the first paper that uses almost all the main features of ADAM and shows how powerful it can be if used correctly.

Using ADAM in Emergency Department arrivals forecasting

Disclaimer: The explanation provided here relies on the content of my monograph “Forecasting and Analytics with ADAM“. In the paper, I ended up creating a quite complicated model that allowed capturing complex demand dynamics. In order to fully understand what I am discussing in this post, you might need to refer to the monograph.

Emergency Department Arrivals. The plots were generated using seasplot() function from the tsutils package.

The figure above shows the data that we were dealing with together with several seasonal plots (generated using seasplot() function from the tsutils package). As we see, the data exhibits hour of day, day of week and week of year seasonalities, although some of them are not very well pronounced. The data does not seem to have a strong trend, although there is a slow increase of the level. Based on this, I decided to use ETS(M,N,M) as the basis for modelling. However, if we want to capture all three seasonal patterns then we need to fit a triple seasonal model, which requires too much computational time, because of the estimation of all the seasonal indices. So, I have decided to use a double-seasonal ETS(M,N,M) instead with hour of day and hour of week seasonalities and to include dummy variables for week of year seasonality. The alternative to week of year dummies would be hour of year seasonal component, which would then require estimating 8760 seasonal indices, potentially overfitting the data. I argue that the week of year dummy provides the sufficient flexibility and there is no need in capturing the detailed intra-yearly profile on a more granular level.

To make things more exciting, given that we deal with hourly data of a UK hospital, we had to deal with issues of daylight saving and leap year. I know that many of us hate the idea of daylight saving, because we have to change our lifestyles 2 times each year just because of an old 18th century tradition. But in addition to being bad for your health, this nasty thing messes things up for my models, because once a year we have 23 hours and in another time we have 25 hours in a day. Luckily, it is taken care of by adam() that shifts seasonal indices, when the time change happens. All you need to do for this mechanism to work is to provide an object with timestamps to the function (for example, zoo). As for the leap year, it becomes less important when we model week of year seasonality instead of the day of year or hour of year one.

Emergency Department Daily Arrivals

Furthermore, as it can be seen from the figure above, it is apparent that calendar events play a crucial role in ED arrivals. For example, the Emergency Department demand over Christmas is typically lower than average (the drops in Figure above), but right after the Christmas it tends to go up (with all the people who injured themselves during the festivities showing up in the hospital). So these events need to be taken into account in a form of additional dummy variables by a model together with their lags (the 24 hour lags of the original variables).

But that’s not all. If we want to fit a multiplicative seasonal model (which makes more sense than the additive one due to changing seasonal amplitude for different times of year), we need to do something with zeroes, which happen naturally in ED arrivals over night (see the first figure in this post with seasonal plots). They do not necessarily happen at the same time of day, but the probability of having no demand tends to increase at night. This meant that I needed to introduce the occurrence part of the model to take care of zeroes. I used a very basic occurrence model called “direct probability“, because it is more sensitive to changes in demand occurrence, making the model more responsive. I did not use a seasonal demand occurrence model (and I don’t remember why), which is one of the limitations of ADAM used in this study.

Finally, given that we are dealing with low volume data, a positive distribution needed to be used instead of the Normal one. I used Gamma distribution because it is better behaved than the Log Normal or the Inverse Gaussian, which tend to have much heavier tails. In the exploration of the data, I found that Gamma does better than the other two, probably because the ED arrivals have relatively slim tails.

So, the final ADAM included the following features:

ETS(M,N,M) as the basis;
Double seasonality;
Week of year dummy variables;
Dummy variables for calendar events with their lags;
“Direct probability” occurrence model;
Gamma distribution for the residuals of the model.

This model is summarised in equation (3) of the paper.

The model was initialised using backcasting, because otherwise we would need to estimate too many initial values for the state vector. The estimation itself was done using likelihood. In R, this corresponded to roughly the following lines of code:

library(smooth)
oesModel <- oes(y, "MNN", occurrence="direct", h=48)
adamModelFirst <- adam(ourData, "MNM", lags=c(24,24*7), formula=y~x+xLag24+weekOfYear,
                       h=48, initial="backcasting",
                       occurrence=oesModel, distribution="dgamma")

Where x was the categorical variable (factor in R) with all the main calendar events. However, even with backcasting, the estimation of such a big model took an hour and 25 minutes. Given that Bahman, Jethro and I have agreed to do rolling origin evaluation, I've decided to help the function in the estimation inside the loop, providing the initials to the optimiser based on the very first estimated model. As a result, each estimation of ADAM in the rolling origin took 1.5 minutes. The code in the loop was modified to:

adamParameters <- coef(adamModelFirst)
oesModel <- oes(y, "MNN", occurrence="direct", h=48)
adamModel <- adam(ourData, "MNM", lags=c(24,24*7), formula=y~x+xLag24+weekOfYear,
                  h=48, initial="backcasting",
                  occurrence=oesModel, distribution="dgamma",
                  B=adamParameters)

Finally, we generated mean and quantile forecasts for 48 hours ahead. I used semiparametric quantiles, because I expected violation of some of assumptions in the model (e.g. autocorrelated residuals). The respective R code is:

testForecast <- forecast(adamModel, newdata=newdata, h=48,
                         interval="semiparametric", level=c(1:19/20), side="upper")

Furthermore, given that the data is integer-valued (how many people visit the hospital each hour) and ADAM produces fractional quantiles (because of the Gamma distribution), I decided to see how it would perform if the quantiles were rounded up. This strategy is simple and might be sensible when a continuous model is used for forecasting on a count data (see discussion in the paper). However, after running the experiment, the ADAM with rounded up quantiles performed very similar to the conventional one, so we have decided not to include it in the paper.

In the end, as stated earlier in this post, we concluded that in our experiment, there were two well performing approaches: GAMLSS with Truncated Normal distribution (called "NOtr-2" in the paper) and ADAM in the form explained above. The popular TBATS, Prophet and Gradient Boosting Machine performed poorly compared to these two approaches. For the first two, this is because of the lack of explanatory variables and inappropriate distributional assumptions (normality). As for the GBM, this is probably due to the lack of dynamic element in it (e.g. changing level and seasonal components).

Concluding this post, as you can see, I managed to fit a decent model based on ADAM, which captured the main characteristics of the data. However, it took a bit of time to understand what features should be included, together with some experiments on the data. This case study shows that if you want to get a better model for your problem, you might need to dive in the problem and spend some time analysing what you have on hands, experimenting with different parameters of a model. ADAM provides the flexibility necessary for such experiments.

Message Story of “Probabilistic forecasting of hourly emergency department arrivals” first appeared on Open Forecast.

The Long and Winding Road: The Story of Complex Exponential Smoothing

Ivan Svetunkov — Tue, 02 Aug 2022 12:26:53 +0000

About the paper.

Disclaimer

The idea of using complex variables in modelling and forecasting was originally proposed by my father, Sergey Svetunkov. Based on that, we developed several models, which were then used in some of our research. We worked together in this direction and published several articles in Russian. My father even published a monograph “Complex-Valued Modeling in Economics and Finance” based on that research.

Pre-PhD period

This story started in 2010 when I worked as an Associate Professor at the Higher School of Economics (HSE) in Saint Petersburg, Russia. By then, I had defended my candidate thesis (in Russia, this is considered an equivalent to a PhD) on the topic of “Complex Variables Production Functions”, and I was teaching Microeconomics, Econometrics and Forecasting to undergraduate students. On my way to work (which would typically take an hour), I would typically read or write something. On one of those days, I came up with the basic formula for Complex Exponential Smoothing, assigning the error term to the imaginary part of the number and using Brown’s Simple Exponential Smoothing as a basis for the new forecasting method. Just for comparison, here is the Simple Exponential Smoothing:
\begin{equation*}
\hat{y}_{t+1} = \alpha y_t + (1-\alpha) \hat{y}_{t} .
\end{equation*}
And here is what I came up with:
\begin{equation*}
\hat{y}_{t+1} + i \hat{\varsigma}_{t+1} = (\alpha_0 + i \alpha_1) (y_t + i \varsigma_t) + (1-\alpha_0 + i – i \alpha_1) (\hat{y}_{t} + i \hat{\varsigma}_{t}) .
\end{equation*}
I’m not explaining this formula in this post (you can read about it here). It is here just for demonstration. It was and still is a complicated forecasting method to understand, but the idea itself excited me. When I returned home, I continued the derivations and did some basic experiments in Excel. I developed the method further in 2010 and presented it in April 2011 at a conference on Business Informatics in Kharkiv, Ukraine (this is one of the cities that Russian army has been bombing in the war that Putin started with Ukraine on 24th February 2022). The idea was well received, and I had encouraging feedback. The first paper on CES was then published in Russian language in the proceedings of the conference (it is available in Russian here and here, p.11 – I used to call the method “Complex Exponentially Weighted Moving Average”, CEWMA back then).

After that, I started thinking of preparing a paper in English and submitting it to an international peer-reviewed journal. HSE had an excellent service, where people outside your department would read your paper and provide feedback. So I used that service after preparing the first draft in English in 2012 and got a review with several comments. One of them was helpful. It said that my paper lacked proper motivation and that, in its current state, it could not be published in a peer-reviewed international journal. However, the other comment was that my research area was uninteresting, nobody did anything like that in the academic world, and thus I should find a different area of research.

I disagreed with the latter point and, after minor modifications, submitted the paper to the International Journal of Forecasting (IJF). As expected, Rob Hyndman (back then, editor-in-chief of the journal) replied that the paper could not be published because it lacked motivation and because I failed to show that the approach worked. At that time, I did not know how to motivate the paper or how to modify it to make it publishable, so that was a dead end for that version of the paper. But I did not want to give up, so in 2012, I applied for a PhD in Management Science at Lancaster University, writing a proposal about my model.

PhD period

I was admitted as a PhD student in 2013 with a scholarship from the Lancaster University Management School, and I started my work under the supervision of Nikolaos Kourentzes and Robert Fildes on the topic “Complex Exponential Smoothing”. After preparing a proper experiment, I received good results and wrote the first version of the R function ces(). The results of this work were presented in my first International Symposium on Forecasting (ISF) in Rotterdam in 2014. Nobody noticed my presentation, and nobody seemed to care.

I then focused on rewriting the paper, Nikos helped me in writing up the motivation. After collecting feedback about the paper from our colleagues, we decided to submit it to a statistical journal. That was very arrogant of us – we did not understand how to write papers for such journals, and nobody in our group ever published there. As a result, we got a desk rejection from the Journal of American Statistical Association in 2015, saying that they do not publish forecasting papers.

In parallel, I started working on an extension of the CES for the seasonal time series, which I then presented at ISF2015 at Riverside, US. I then managed to discuss my research with Keith Ord, who expressed his interest in it and provided support and guidance for some parts of it. He even helped me with some derivations, which I included in the first paper.

To make things even more complicated, I continued work on my PhD and wrote a second paper, extending CES for seasonal time series. At the end of 2015, I resubmitted the first paper to Operations Research journal, where it got desk-rejected, and then to EJOR (European Journal of Operational Research). After a short discussion with Nikos, we decided to submit the second paper to IJF, hoping that the first will progress fast and that the two of them can be done in parallel. That was a fatal mistake, which impacted my academic career and mental well-being for the next several years.

Unfortunately, the first paper got rejected from EJOR after the second round of revision, with a second reviewer saying that it could not be published because we did not use the Diebold-Mariano test (yes, that was the reason. Note: we used Nemenyi instead). As for the second one, it got stuck in IJF. In the first round, the second reviewer said that the model has a fatal flaw and cannot be used in practice (he concluded that because he misunderstood how the model worked). In the second round, when we explained the model in more detail, the reviewer looked more carefully at CES and started criticising the first paper, which by then was published as a working paper. We placed ourselves in a challenging situation: we had to defend the first paper in the revision of the second one. This process led us to the third and then to the fourth round without significant progress. We were discussing the meaning of complex variables in the model and whether the imaginary part of the model makes sense instead of discussing the seasonal extension of CES. It was apparent that the model works (it performed better than ETS and ARIMA on the M competition data), but the reviewers had questions about the interpretation of the original model. In the fourth round, an Associate Editor of IJF has written that “I still maintain view and so does reviewer 2 that there is an interesting paper lurking under this paper but we are yet to see it and evaluate it on its own merits“. It became clear that we were not moving forward and that the only way out of this dead end would be to merge the two papers and restart the submission process – by then, we were discussing a completely different paper than the one submitted initially to IJF. I was not ready for this serious step, and I decided not to continue the revision process in IJF and put the paper on hold. By then, my publishing experience had been very disappointing and demotivating, and I struggled to continue doing anything in that research direction. Whenever I would open the paper, it would spoil my mood for the rest of the day, as I would think that it was unpublishable and that nobody needed my work (as I’ve been told repeatedly by many different people starting from 2010).

Nonetheless, somewhere in the middle of the IJF revision, at the end of 2016, I had my viva. I got PhD in Management Science defending the thesis on the topic “Complex Exponential Smoothing”.

Post-PhD period

At the end of 2017, Fotios Petropoulos suggested me to participate in the M4 competition. His idea was to submit a combination of forecasts from several models: ETS, ARIMA, Theta and CES. After trying out several options, we used median for the combination (I must confess that we weren’t the first ones that did that, this was investigated, for example, by Jose & Winkler, 2008). This approach got to 6th place in the competition. We were invited to submit a paper explaining our approach, which was then published in IJF (Petropoulos & Svetunkov, 2020). That paper is the first paper published in a peer-reviewed journal discussing CES.

In 2018, during the ISF in Boulder, Nikos and I invited Keith Ord to join our paper – he supported me during my PhD and made a substantial contribution to the paper. We decided to clean the paper up, rewrite some parts, and submit it to a peer-reviewed journal as a paper from three co-authors. It took us some time to return to the original text, revive the R code and update the paper. In the middle of 2019, Nikos, Keith and I submitted the CES paper to the Journal of Time Series Analysis. It was a desk rejection with a comment that the Associate Editor “…argues that your paper is a relatively straightforward extension of smoothing via a state space model” and thus the paper “is not appropriate for publication in this journal in terms of substantive content“. We rewrote the motivation to align the paper with an OR-related journal and submitted it to Omega, to get another desk rejection saying that it is too mathematical for them and that the paper “is quite technical and would likely be best served by targeting a journal in the time series or forecasting field instead“.

Finally, at the end of 2019, we submitted the paper to Naval Research Logistics (NRL). By then, I did not have any expectations about the paper and was sure that it would either be a desk rejection or a rejection from reviewers – I had seen this outcome so many times that it would be naive to expect anything else to happen. However, this time we got an Associate Editor who liked the idea and supported us from the first revision. In fact, they pointed out that CES has already been used in M4 competition and showed that it brought value. On 24th February 2021, we got our first round of revision, after which I decided to move some parts of paper 2 (seasonal CES) to the first one, merging the two. It made sense because the paper would now look complete. While one of the reviewers was sceptical about the paper, Associate Editor provided colossal support and guided us in what to change in the paper so that it could be accepted in NRL. After two rounds and some additional rewrites of the paper, on 18th June 2022, it was accepted for publication in Naval Research Logistics, and then published online on 2nd August 2022.

Conclusions

Complex Exponential Smoothing is a complex idea, something that people are not used to. It stands out and does things differently, not the way the researchers typically do. This is what makes it interesting, and this is what made it extremely difficult to publish. Over the years, I questioned the correctness and usefulness of my idea many times. Some days I would be dancing around, singing “it works, it works” after a successful experiment; on others, I would throw it away, saying “never again” when the experiments failed. This is all part of academic life. However, the most challenging experience for me was the publication of the paper. Over the years, I have met a lot of resistance from the academic world.

I have not included here comments from my former Higher School of Economics colleagues or comments from some journal reviewers. They rarely were pleasant and supportive. Some people did not understand the idea, the others did not want to understand it. But there were always several people around me who helped and guided me. I would not be able to publish the paper in the end if it was not for the support from Nikos Kourentzes, Keith Ord, Sergey Svetunkov (my father) and Anna Sroginis (my wife). They believed in the idea and supported me even when it looked that it wouldn’t work. So, I am immensely grateful for their support. It has been a long and winding road… and I’m glad that it’s finally over.

As for the lessons to learn from this, I have several for you:

Do not try publishing dependent papers in parallel: if your second paper depends on the first one, do not submit it before the first one is at least accepted.
If you want to publish in a journal in which your group does not typically publish, find a person who does and work with them. That became apparent to me when I worked on a different paper with a colleague from a statistics department. Statistical journals have a completely different style than the OR ones, and we had no chance to publish CES paper there.
As a reviewer, you might not understand the paper you are reviewing. This is okay. We cannot know and understand everything instantaneously. But that does not mean that the paper is not good. It only means that you need to invest more time in understanding the paper and then help to improve it (yes, paper revision is a serious job, not a box-ticking process). I had many comments of the style “I did not understand it, so reject”. This is not how revisions should be done.

Last but not least, be critical of your ideas, but if you believe in something, stick with it and be patient. It might take a lot of time for other people to start appreciating what you have been trying to show them.

Message The Long and Winding Road: The Story of Complex Exponential Smoothing first appeared on Open Forecast.