Archives Regression - Open Forecast

Story of “Probabilistic forecasting of hourly emergency department arrivals”

Ivan Svetunkov — Wed, 10 May 2023 20:47:27 +0000

Back in 2020, when we were all siting in the COVID lockdown, I had a call with Bahman Rostami-Tabar to discuss one of our projects. He told me that he had an hourly data of an Emergency Department from a hospital in Wales, and suggested writing a paper for a healthcare audience to show them how forecasting can be done properly in this setting. I noted that we did not have experience in working with high frequency data, and it would be good to have someone with relevant expertise. I knew a guy who worked in energy forecasting, Jethro Browell (we are mates in the IIF UK Chapter), so we had a chat between the three of us and formed a team to figure out better ways for ED arrival demand forecasting.

We agreed that each one of us will try their own models. Bahman wanted to try TBATS, Prophet and models from the fasster package in R (spoiler: the latter ones produced very poor forecasts on our data, so we removed them from the paper). Jethro had a pool of GAMLSS models with different distributions, including Poisson and truncated Normal. He also tried a Gradient Boosting Machine (GBM). I decided to test ETS, Poisson Regression and ADAM. We agreed that we will measure performance of models not only in terms of point forecasts (using RMSE), but also in terms of quantiles (pinball and quantile bias) and computational time. It took us a year to do all the experiments and another one to find a journal that would not desk-reject our paper because the editor thought that it was not relevant (even though they have published similar papers in the past). It was rejected from Annals of Emergency Medicine, Emergency Medicine Journal, American Journal of Emergency Medicine and Journal of Medical Systems. In the end, we submitted to Health Systems, and after a short revision the paper got accepted. So, there is a happy end in this story.

In the paper itself, we found that overall, in terms of quantile bias (calibration of models), GAMLSS with truncated Normal distribution and ADAM performed better than the other approaches, with the former also doing well in terms of pinball loss and the latter doing well in terms of point forecasts (RMSE). Note that the count data models did worse than the continuous ones, although one would expect Poisson distribution to be appropriate for the ED arrivals.

I don’t want to explain the paper and its findings in detail in this post, but given my relation to ADAM, I have decided to briefly explain what I included in the model and how it was used. After all, this is the first paper that uses almost all the main features of ADAM and shows how powerful it can be if used correctly.

Using ADAM in Emergency Department arrivals forecasting

Disclaimer: The explanation provided here relies on the content of my monograph “Forecasting and Analytics with ADAM“. In the paper, I ended up creating a quite complicated model that allowed capturing complex demand dynamics. In order to fully understand what I am discussing in this post, you might need to refer to the monograph.

Emergency Department Arrivals. The plots were generated using seasplot() function from the tsutils package.

The figure above shows the data that we were dealing with together with several seasonal plots (generated using seasplot() function from the tsutils package). As we see, the data exhibits hour of day, day of week and week of year seasonalities, although some of them are not very well pronounced. The data does not seem to have a strong trend, although there is a slow increase of the level. Based on this, I decided to use ETS(M,N,M) as the basis for modelling. However, if we want to capture all three seasonal patterns then we need to fit a triple seasonal model, which requires too much computational time, because of the estimation of all the seasonal indices. So, I have decided to use a double-seasonal ETS(M,N,M) instead with hour of day and hour of week seasonalities and to include dummy variables for week of year seasonality. The alternative to week of year dummies would be hour of year seasonal component, which would then require estimating 8760 seasonal indices, potentially overfitting the data. I argue that the week of year dummy provides the sufficient flexibility and there is no need in capturing the detailed intra-yearly profile on a more granular level.

To make things more exciting, given that we deal with hourly data of a UK hospital, we had to deal with issues of daylight saving and leap year. I know that many of us hate the idea of daylight saving, because we have to change our lifestyles 2 times each year just because of an old 18th century tradition. But in addition to being bad for your health, this nasty thing messes things up for my models, because once a year we have 23 hours and in another time we have 25 hours in a day. Luckily, it is taken care of by adam() that shifts seasonal indices, when the time change happens. All you need to do for this mechanism to work is to provide an object with timestamps to the function (for example, zoo). As for the leap year, it becomes less important when we model week of year seasonality instead of the day of year or hour of year one.

Emergency Department Daily Arrivals

Furthermore, as it can be seen from the figure above, it is apparent that calendar events play a crucial role in ED arrivals. For example, the Emergency Department demand over Christmas is typically lower than average (the drops in Figure above), but right after the Christmas it tends to go up (with all the people who injured themselves during the festivities showing up in the hospital). So these events need to be taken into account in a form of additional dummy variables by a model together with their lags (the 24 hour lags of the original variables).

But that’s not all. If we want to fit a multiplicative seasonal model (which makes more sense than the additive one due to changing seasonal amplitude for different times of year), we need to do something with zeroes, which happen naturally in ED arrivals over night (see the first figure in this post with seasonal plots). They do not necessarily happen at the same time of day, but the probability of having no demand tends to increase at night. This meant that I needed to introduce the occurrence part of the model to take care of zeroes. I used a very basic occurrence model called “direct probability“, because it is more sensitive to changes in demand occurrence, making the model more responsive. I did not use a seasonal demand occurrence model (and I don’t remember why), which is one of the limitations of ADAM used in this study.

Finally, given that we are dealing with low volume data, a positive distribution needed to be used instead of the Normal one. I used Gamma distribution because it is better behaved than the Log Normal or the Inverse Gaussian, which tend to have much heavier tails. In the exploration of the data, I found that Gamma does better than the other two, probably because the ED arrivals have relatively slim tails.

So, the final ADAM included the following features:

ETS(M,N,M) as the basis;
Double seasonality;
Week of year dummy variables;
Dummy variables for calendar events with their lags;
“Direct probability” occurrence model;
Gamma distribution for the residuals of the model.

This model is summarised in equation (3) of the paper.

The model was initialised using backcasting, because otherwise we would need to estimate too many initial values for the state vector. The estimation itself was done using likelihood. In R, this corresponded to roughly the following lines of code:

library(smooth)
oesModel <- oes(y, "MNN", occurrence="direct", h=48)
adamModelFirst <- adam(ourData, "MNM", lags=c(24,24*7), formula=y~x+xLag24+weekOfYear,
                       h=48, initial="backcasting",
                       occurrence=oesModel, distribution="dgamma")

Where x was the categorical variable (factor in R) with all the main calendar events. However, even with backcasting, the estimation of such a big model took an hour and 25 minutes. Given that Bahman, Jethro and I have agreed to do rolling origin evaluation, I've decided to help the function in the estimation inside the loop, providing the initials to the optimiser based on the very first estimated model. As a result, each estimation of ADAM in the rolling origin took 1.5 minutes. The code in the loop was modified to:

adamParameters <- coef(adamModelFirst)
oesModel <- oes(y, "MNN", occurrence="direct", h=48)
adamModel <- adam(ourData, "MNM", lags=c(24,24*7), formula=y~x+xLag24+weekOfYear,
                  h=48, initial="backcasting",
                  occurrence=oesModel, distribution="dgamma",
                  B=adamParameters)

Finally, we generated mean and quantile forecasts for 48 hours ahead. I used semiparametric quantiles, because I expected violation of some of assumptions in the model (e.g. autocorrelated residuals). The respective R code is:

testForecast <- forecast(adamModel, newdata=newdata, h=48,
                         interval="semiparametric", level=c(1:19/20), side="upper")

Furthermore, given that the data is integer-valued (how many people visit the hospital each hour) and ADAM produces fractional quantiles (because of the Gamma distribution), I decided to see how it would perform if the quantiles were rounded up. This strategy is simple and might be sensible when a continuous model is used for forecasting on a count data (see discussion in the paper). However, after running the experiment, the ADAM with rounded up quantiles performed very similar to the conventional one, so we have decided not to include it in the paper.

In the end, as stated earlier in this post, we concluded that in our experiment, there were two well performing approaches: GAMLSS with Truncated Normal distribution (called "NOtr-2" in the paper) and ADAM in the form explained above. The popular TBATS, Prophet and Gradient Boosting Machine performed poorly compared to these two approaches. For the first two, this is because of the lack of explanatory variables and inappropriate distributional assumptions (normality). As for the GBM, this is probably due to the lack of dynamic element in it (e.g. changing level and seasonal components).

Concluding this post, as you can see, I managed to fit a decent model based on ADAM, which captured the main characteristics of the data. However, it took a bit of time to understand what features should be included, together with some experiments on the data. This case study shows that if you want to get a better model for your problem, you might need to dive in the problem and spend some time analysing what you have on hands, experimenting with different parameters of a model. ADAM provides the flexibility necessary for such experiments.

Message Story of “Probabilistic forecasting of hourly emergency department arrivals” first appeared on Open Forecast.

smooth v3.2.0: what’s new?

Ivan Svetunkov — Mon, 30 Jan 2023 13:06:47 +0000

smooth package has reached version 3.2.0 and is now on CRAN. While the version change from 3.1.7 to 3.2.0 looks small, this has introduced several substantial changes and represents a first step in moving to the new C++ code in the core of the functions. In this short post, I will outline the main new features of smooth 3.2.0.

New engines for ETS, MSARIMA and SMA

The first and one of the most important changes is the new engine for the ETS (Error-Trend-Seasonal exponential smoothing model), MSARIMA (Multiple Seasonal ARIMA) and SMA (Simple Moving Average), implemented respectively in es(), msarima() and sma() functions. The new engine was developed for adam() and the three models above can be considered as special cases of it. You can read more about ETS in ADAM monograph, starting from Chapter 4; MSARIMA is discussed in Chapter 9, while SMA is briefly discussed in Subsection 3.3.3.

The es() function now implements the ETS close to the conventional one, assuming that the error term follows normal distribution. It still supports explanatory variables (discussed in Chapter 10 of ADAM monograph) and advanced estimators (Chapter 11), and it has the same syntax as the previous version of the function had, but now acts as a wrapper for adam(). This means that it is now faster, more accurate and requires less memory than it used to. msarima() being a wrapper of adam() as well, is now also faster and more accurate than it used to be. But in addition to that both functions now support the methods that were developed for adam(), including vcov(), confint(), summary(), rmultistep(), reapply(), plot() and others. So, now you can do more thorough analysis and improve the models using all these advanced instruments (see, for example, Chapter 14 of ADAM).

The main reason why I moved the functions to the new engine was to clean up the code and remove the old chunks that were developed when I only started learning C++. A side effect, as you see, is that the functions have now been improved in a variety of ways.

And to be on the safe side, the old versions of the functions are still available in smooth under the names es_old(), msarima_old() and sma_old(). They will be removed from the package if it ever reaches the v.4.0.0.

New methods for ADAM

There are two new methods for adam() that can be used in a variety of cases. The first one is simulate(), which will generate data based on the estimated ADAM, whatever the original model is (e.g. mixture of ETS, ARIMA and regression on the data with multiple frequencies). Here is how it can be used:

adam(BJsales, "AAdN") |>
     simulate() |>
     plot()

which will produce a plot similar to the following:

Simulated data based on adam() applied to Box-Jenkins sales data

This can be used for research, when a more controlled environment is needed. If you want to fine tune the parameters of ADAM before simulating the data, you can save the output in an object and amend its parameters. For example:

testModel <- adam(BJsales, "AAdN")
testModel$persistence <- c(0.5, 0.2)
simulate(testModel)

The second new method is the xtable() from the respective xtable package. It produces LaTeX version of the table from the summary of ADAM. Here is an example of a summary from ADAM ETS:

adam(BJsales, "AAdN") |>
     summary()

Model estimated using adam() function: ETS(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 256.1516
Coefficients:
      Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha   0.9514     0.1292     0.6960      1.0000 *
beta    0.3328     0.2040     0.0000      0.7358  
phi     0.8560     0.1671     0.5258      1.0000 *
level 203.2835     5.9968   191.4304    215.1289 *
trend  -2.6793     4.7705   -12.1084      6.7437  

Error standard deviation: 1.3623
Sample size: 150
Number of estimated parameters: 6
Number of degrees of freedom: 144
Information criteria:
     AIC     AICc      BIC     BICc 
524.3032 524.8907 542.3670 543.8387

As you can see in the output above, the function generates the confidence intervals for the parameters of the model, including the smoothing parameters, dampening parameter and the initial states. This summary can then be used to generate the LaTeX code for the main part of the table:

adam(BJsales, "AAdN") |>
     xtable()

which will looks something like this:

Summary of adam()

Other improvements

First, one of the major changes in smooth functions is the new backcasting mechanism for adam(), es() and msarima() (this is discussed in Section 11.4 of ADAM monograph). The main difference with the old one is that now it does not backcast the parameters for the explanatory variables and estimates them separately via optimisation. This feature appeared to be important for some of users who wanted to try MSARIMAX/ETSX (a model with explanatory variables) but wanted to use backcasting as the initialisation. These users then wanted to get a summary, analysing the uncertainty around the estimates of parameters for exogenous variables, but could not because the previous implementation would not estimate them explicitly. This is now available. Here is an example:

cbind(BJsales, BJsales.lead) |>
    adam(model="AAdN", initial="backcasting") |>
    summary()

Model estimated using adam() function: ETSX(AAdN)
Response variable: BJsales
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 255.1935
Coefficients:
             Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha          0.9724     0.1108     0.7534      1.0000 *
beta           0.2904     0.1368     0.0199      0.5607 *
phi            0.8798     0.0925     0.6970      1.0000 *
BJsales.lead   0.1662     0.2336    -0.2955      0.6276  

Error standard deviation: 1.3489
Sample size: 150
Number of estimated parameters: 5
Number of degrees of freedom: 145
Information criteria:
     AIC     AICc      BIC     BICc 
520.3870 520.8037 535.4402 536.4841

As you can see in the output above, the initial level and trend of the model are not reported, because they were estimated via backcasting. However, we get the value of the parameter BJsales.lead and the uncertainty around it. The old backcasting approach is now called "complete", implying that all values of the state vector are produce via backcasting.

Second, forecast.adam() now has a parameter scenarios, which when TRUE will return the simulated paths from the model. This only works when interval="simulated" and can be used for the analysis of possible forecast trajectories.

Third, the plot() method now can also produce ACF/PACF for the squared residuals for all smooth functions. This becomes useful if you suspect that your data has ARCH elements and want to see if they need to be modelled separately. This can also be done using adam() and sm() and is discussed in Chapter 17 of the monograph.

Finally, the sma() function now has the fast parameter, which when true will use a modified Ternary search for the best order based on information criteria. It might not give the global minimum, but it works much faster than the exhaustive search.

Conclusions

These are the main new features in the package. I feel that the main job in smooth is already done, and all I can do now is just tune the functions and improve the existing code. I want to move all the functions to the new engine and ditch the old one, but this requires much more time than I have. So, I don't expect to finish this any time soon, but I hope I'll get there someday. On the other hand, I'm not sure that spending much time on developing an R package is a wise idea, given that nowadays people tend to use Python. I would develop Python analogue of the smooth package, but currently I don't have the necessary expertise and time to do that. Besides, there already exist great libraries, such as tsforecast from nixtla and sktime. I am not sure that another library, implementing ETS and ARIMA is needed in Python. What do you think?

Message smooth v3.2.0: what’s new? first appeared on Open Forecast.

The first draft of “Forecasting and Analytics with ADAM”

Ivan Svetunkov — Mon, 11 Apr 2022 15:30:26 +0000

Forecasting and Analytics with ADAM

After working on this for more than a year, I have finally prepared the first draft of my online monograph “Forecasting and Analytics with ADAM“. This is a monograph on the model that unites ETS, ARIMA and regression and introduces advanced features in univariate modelling, including:

ETS in a new State Space form;
ARIMA in a new State Space form;
Regression;
TVP regression;
Combinations of (1), (2) and either (3), or (4);
Automatic selection/combination for ETS;
Automatic orders selection for ARIMA;
Variables selection for regression part;
Normal and non-normal distributions;
Automatic selection of most suitable distribution;
Multiple seasonality;
Occurrence part of the model to handle zeroes in data (intermittent demand);
Modelling scale of distribution (GARCH and beyond);
Handling uncertainty of estimates of parameters.

The model and all its features are already implemented in adam() function from smooth package for R (you need v3.1.6 from CRAN for all the features listed above). The function supports many options that allow one experimenting with univariate forecasting, allowing to build complex models, combining elements from the list above. The monograph explaining how models underlying ADAM and how to work with them is available online, and I plan to produce several physical copies of it after refining the text. Furthermore, I have already asked two well-known academics to act as reviewers of the monograph to collect the feedback and improve the monograph, and if you want to act as a reviewer as well, please let me know.

Examples in R

Just to give you a flavour of ADAM, I decided to provide a couple of examples on time series AirPassengers (included in datasets package in R). The first one is the ADAM ETS.

Building and selecting the most appropriate ADAM ETS comes to running the following line of code:

adamETSAir <- adam(AirPassengers, h=12, holdout=TRUE)

In this case, ADAM will select the most appropriate ETS model for the data, creating a holdout of the last 12 observations. We can see the details of the model by printing the output:

adamETSAir

Time elapsed: 0.75 seconds
Model estimated using adam() function: ETS(MAM)
Distribution assumed in the model: Gamma
Loss function type: likelihood; Loss function value: 467.2981
Persistence vector g:
 alpha   beta  gamma 
0.7691 0.0053 0.0000 

Sample size: 132
Number of estimated parameters: 17
Number of degrees of freedom: 115
Information criteria:
      AIC      AICc       BIC      BICc 
 968.5961  973.9646 1017.6038 1030.7102 

Forecast errors:
ME: 9.537; MAE: 20.784; RMSE: 26.106
sCE: 43.598%; Asymmetry: 64.8%; sMAE: 7.918%; sMSE: 0.989%
MASE: 0.863; RMSSE: 0.833; rMAE: 0.273; rRMSE: 0.254

The output above provides plenty of detail on what was estimated and how. Some of these elements have been discussed in one of my previous posts on es() function. The new thing is the information about the assumed distribution for the response variable. By default, ADAM works with Gamma distribution in case of multiplicative error model. This is done to make model more robust in cases of low volume data, where the Normal distribution might produce negative numbers (see my presentation on this issues). In case of high volume data, the Gamma distribution will perform similar to the Normal one. The pure multiplicative ADAM ETS is discussed in Chapter 6 of ADAM monograph. If Gamma is not suitable, then the other distribution can be selected via the distribution parameter. There is also an automated distribution selection approach in the function auto.adam():

adamETSAutoAir <- auto.adam(AirPassengers, h=12, holdout=TRUE)
adamETSAutoAir

Time elapsed: 3.86 seconds
Model estimated using auto.adam() function: ETS(MAM)
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 466.0744
Persistence vector g:
 alpha   beta  gamma 
0.8054 0.0000 0.0000 

Sample size: 132
Number of estimated parameters: 17
Number of degrees of freedom: 115
Information criteria:
      AIC      AICc       BIC      BICc 
 966.1487  971.5172 1015.1564 1028.2628 

Forecast errors:
ME: 9.922; MAE: 21.128; RMSE: 26.246
sCE: 45.36%; Asymmetry: 65.4%; sMAE: 8.049%; sMSE: 1%
MASE: 0.877; RMSSE: 0.838; rMAE: 0.278; rRMSE: 0.255

As we see from the output above, the Normal distribution is more appropriate for the data in terms of AICc than the other ones tried out by the function (by default the list includes Normal, Laplace, S, Generalised Normal, Gamma, Inverse Gaussian and Log Normal distributions, but this can be amended by providing a vector of names via distribution parameter). The selection of ADAM ETS and distributions is discussed in Chapter 15 of the monograph.

Having obtained the model, we can diagnose it using plot.adam() function:

par(mfcol=c(3,3))
plot(adamETSAutoAir,which=c(1,4,2,6,7,8,10,11,13))

The which parameter specifies what type of plots to produce, you can find the list of plots in the documentation for plot.adam(). The code above will result in:

Diagnostics plots for ADAM ETS on AirPassengers data

The diagnostic plots are discussed in the Chapter 14 of ADAM monograph. The plot above does not show any serious issues with the model.

Just for the comparison, we could also try fitting the most appropriate ADAM ARIMA to the data (this model is discussed in Chapter 9). The code in this case is slightly more complicated, because we need to switch off ETS part of the model and define the maximum orders of ARIMA to try:

adamARIMAAir <- adam(AirPassengers, model="NNN", h=12, holdout=TRUE,
                     orders=list(ar=c(3,2),i=c(2,1),ma=c(3,2),select=TRUE))

This results in the following automatically selected ARIMA model:

Time elapsed: 3.54 seconds
Model estimated using auto.adam() function: SARIMA(0,1,1)[1](0,1,1)[12]
Distribution assumed in the model: Normal
Loss function type: likelihood; Loss function value: 491.7117
ARMA parameters of the model:
MA:
 theta1[1] theta1[12] 
   -0.1952    -0.0720 

Sample size: 132
Number of estimated parameters: 16
Number of degrees of freedom: 116
Information criteria:
     AIC     AICc      BIC     BICc 
1015.423 1020.154 1061.548 1073.097 

Forecast errors:
ME: -13.795; MAE: 16.65; RMSE: 21.644
sCE: -63.064%; Asymmetry: -79.4%; sMAE: 6.343%; sMSE: 0.68%
MASE: 0.691; RMSSE: 0.691; rMAE: 0.219; rRMSE: 0.21

Given that ADAM ETS and ADAM ARIMA are formulated in the same framework, they are directly comparable using information critirea. Comparing AICc of the models adamETSAutoAir and adamARIMAAir, we can conclude that the former is more appropriate to the data than the latter. However, the default ARIMA works with the Normal distribution, which might not be appropriate for the data, so we can revert to the auto.adam() to select the better one:

adamAutoARIMAAir <- auto.adam(AirPassengers, model="NNN", h=12, holdout=TRUE,
                              orders=list(ar=c(3,2),i=c(2,1),ma=c(3,2),select=TRUE))

This will take more computational time, but will result in a different model with a lower AICc (which is still higher than the one in ADAM ETS):

Time elapsed: 25.46 seconds
Model estimated using auto.adam() function: SARIMA(0,1,1)[1](0,1,1)[12]
Distribution assumed in the model: Log-Normal
Loss function type: likelihood; Loss function value: 472.923
ARMA parameters of the model:
MA:
 theta1[1] theta1[12] 
   -0.2785    -0.5530 

Sample size: 132
Number of estimated parameters: 16
Number of degrees of freedom: 116
Information criteria:
      AIC      AICc       BIC      BICc 
 977.8460  982.5764 1023.9708 1035.5197 

Forecast errors:
ME: -12.968; MAE: 13.971; RMSE: 19.143
sCE: -59.285%; Asymmetry: -91.7%; sMAE: 5.322%; sMSE: 0.532%
MASE: 0.58; RMSSE: 0.611; rMAE: 0.184; rRMSE: 0.186

Note that although the AICc is higher for ARIMA than for ETS, the former has lower error measures than the latter. So, the higher AICc does not necessarily mean that the model is not good. But if we rely on the information criteria, then we should stick with ADAM ETS and we can then produce the forecasts for the next 12 observations (see Chapter 18):

adamETSAutoAirForecast <- forecast(adamETSAutoAir, h=12, interval="prediction",
                                   level=c(0.9,0.95,0.99))
par(mfcol=c(1,1))
plot(adamETSAutoAirForecast)

Forecast from ADAM ETS

Finally, if we want to do a more in-depth analysis of parameters of ADAM, we can also produce the summary, which will create the confidence intervals for the parameters of the model:

summary(adamETSAutoAir)

Model estimated using auto.adam() function: ETS(MAM)
Response variable: data
Distribution used in the estimation: Normal
Loss function type: likelihood; Loss function value: 466.0744
Coefficients:
            Estimate Std. Error Lower 2.5% Upper 97.5%  
alpha         0.8054     0.0864     0.6343      0.9761 *
beta          0.0000     0.0203     0.0000      0.0401  
gamma         0.0000     0.0382     0.0000      0.0755  
level        96.2372     6.8596    82.6496    109.7919 *
trend         2.0901     0.3955     1.3068      2.8716 *
seasonal_1    0.9145     0.0077     0.9003      0.9372 *
seasonal_2    0.8999     0.0081     0.8857      0.9227 *
seasonal_3    1.0308     0.0094     1.0165      1.0535 *
seasonal_4    0.9885     0.0077     0.9743      1.0112 *
seasonal_5    0.9856     0.0072     0.9713      1.0083 *
seasonal_6    1.1165     0.0093     1.1023      1.1392 *
seasonal_7    1.2340     0.0115     1.2198      1.2568 *
seasonal_8    1.2254     0.0105     1.2112      1.2481 *
seasonal_9    1.0668     0.0094     1.0526      1.0896 *
seasonal_10   0.9256     0.0087     0.9113      0.9483 *
seasonal_11   0.8040     0.0075     0.7898      0.8268 *

Error standard deviation: 0.0367
Sample size: 132
Number of estimated parameters: 17
Number of degrees of freedom: 115
Information criteria:
      AIC      AICc       BIC      BICc 
 966.1487  971.5172 1015.1564 1028.2628

Note that the summary() function might complain about the Observed Fisher Information. This is because the covariance matrix of parameters is calculated numerically and sometimes the likelihood is not maximised properly. I have not been able to fully resolve this issue yet, but hopefully will do at some point. The summary above shows, for example, that the smoothing parameters $\beta$ and $\gamma$ are not significantly different from zero (on 5% level), while $\alpha$ is expected to vary between 0.6343 and 0.9761 in 95% of the cases. You can read more about the uncertainty of parameters in ADAM in Chapter 16 of the monograph.

As for the other features of ADAM, here is a brief guide:

If you work with multiple seasonal data, then you might need to specify the seasonality via the lags parameter, for example as lags=c(24,7*24) in case of hourly data. This is discussed in Chapter 12;
If you have intermittent data, then you should read Chapter 13, which explains how to work with the occurrence parameter of the function;
Explanatory variables are discussed in Chapter 10 and are handled in the adam() function via the formula parameter;
In the cases of heteroscedasticity (time varying or induced by some explanatory variables), there a scale model (which is discussed in Chapter 17 and implemented as sm() method for the adam class).

You can also experiment with advanced estimators (Chapter 11, including custom loss functions) via the loss parameter and forecast combinations (Section 15.4).

Long story short, if you are interested in univariate forecasting, then do give ADAM a try - it might have the flexibility you needed for your experiments. If you are worried about its accuracy, check out this post, where I compared ADAM with other models.

And, as a friend of mine says, "Happy forecasting!"

Message The first draft of “Forecasting and Analytics with ADAM” first appeared on Open Forecast.

Introducing scale model in greybox

Ivan Svetunkov — Sun, 23 Jan 2022 18:04:33 +0000

At the end of June 2021, I released the greybox package version 1.0.0. This was a major release, introducing new functionality, but I did not have time to write a separate post about it because of the teaching and lack of free time. Finally, Christmas has arrived, and I could spend several hours preparing the post about it. In this post, I want to tell you about the new major feature in the greybox package.

Scale Model

The Scale Model is the regression-like model focusing on capturing the relation between the scale of distribution (for example, variance in Normal distribution) and a set of explanatory variables. It is implemented in sm() method in the greybox package. The motivation for this comes from GAMLSS, the Generalised Additive Model for Location, Scale and Shape. While I have decided not to bother with the “GAM” part of this (there are gam and gamlss packages in R that do that), I liked the idea of being able to predict the scale (for example, variance) of a distribution. This becomes especially useful when one suspects heteroscedasticity in the model but does not think that variable transformations are appropriate.

To understand what the function does, it is necessary first to discuss the underlying model. We will start the discussion with an example of a linear regression model with two explanatory variables, assuming Normally distributed residuals $\xi_t$ with zero mean and a fixed variance $\sigma^2$, $\xi_t \sim \mathcal{N}(0,\sigma^2)$, which can be formulated as:
\begin{equation} \label{eq:model1}
y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \xi_t ,
\end{equation}
where $y_t$ is the response variable, $x_{1,t}$ and $x_{2,t}$ are the explanatory variables on observation $t$, $\beta_0$, $\beta_1$ and $\beta_2$ are the parameters of the model and $\xi_t \sim \mathcal{N}\left(0, \sigma^2 \right)$. Recalling the basic properties of Normal distribution, we can rewrite the same model as a model with standard normal residuals $\epsilon_t \sim \mathcal{N}\left(0, 1 \right)$ by inserting $\xi_t = \sigma \epsilon_t$ in \eqref{eq:model1}:
\begin{equation} \label{eq:model2}
y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \sigma \epsilon_t .
\end{equation}
Now if we suspect that the variance of the model might not be constant, we can substitute the standard deviation $\sigma$ with some function, transforming the model into:
\begin{equation} \label{eq:model3}
y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + f\left(\gamma_0 + \gamma_2 x_{2,t} + \gamma_3 x_{3,t}\right) \epsilon_t ,
\end{equation}
where $x_{2,t}$ and $x_{3,t}$ are the explanatory variables (as you see, not necessarily the same as in the first part of the model) and $\gamma_0$, $\gamma_1$ and $\gamma_2$ are the parameters of the scale part of the model. The idea here is that there is a regression model for the conditional mean of the distribution $\beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t}$, and that there is another one that will regulate the standard deviation via $f\left(\gamma_0 + \gamma_2 x_{2,t} + \gamma_3 x_{3,t}\right)$. The main thing to keep in mind about the latter is that the function $f(\cdot)$ needs to be strictly positive because the standard deviation cannot be zero or negative. The simplest way to guarantee this is to use exponent instead of $f(\cdot)$. Furthermore, in our example with Normal distribution, the scale corresponds to the variance, so we should be introducing the model for variance: $\sigma^2_t = \exp\left(\gamma_0 + \gamma_2 x_{2,t} + \gamma_3 x_{3,t}\right)$. This leads to the following model:
\begin{equation} \label{eq:model4}
y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \sqrt{\exp\left(\gamma_0 + \gamma_2 x_{2,t} + \gamma_3 x_{3,t}\right)} \epsilon_t ,
\end{equation}
The model above would not only have the conditional mean depending on the values of explanatory variables (the conventional regression) but also the conditional variance, which would change depending on the values of variables. Note that this model assumes the linearity in the conditional mean: increase of $x_{1,t}$ by one leads to the increase of $y_t$ by $\beta_1$ on average. At the same time, it assumes non-linearity in the variance: increase of $x_{2,t}$ by one leads to the increase of variance by $\exp(\gamma_2-1)\times 100$%. If we want a non-linear change in the conditional mean, we can use a model in logarithms. Alternatively, we could assume a different distribution for the response variable $y_t$. To understand how the latter would work, we need to represent the same model \eqref{eq:model4} in a more general form. For the Normal distribution, the same model \eqref{eq:model4} can be rewritten as:
\begin{equation} \label{eq:model5}
y_t \sim \mathcal{N}\left(\beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t}, \exp\left(\gamma_0 + \gamma_2 x_{2,t} + \gamma_3 x_{3,t}\right)\right).
\end{equation}
This representation allows introducing scale model for many other distributions, such as Laplace, Generalised Normal, Gamma, Inverse Gaussian etc. All that we need to do in those cases is to substitute the distribution $\mathcal{N}(\cdot)$ with a distribution of interest. The sm() function supports the same list of distributions as alm() (see the vignette for the function on CRAN or in R using the command vignette()). Each specific formula for scale would differ from one distribution to another, but the principles will be the same.

Demonstration in R

For demonstration purposes, we will use an example with artificial data, generated according to the model \eqref{eq:model4}:

xreg <- matrix(rnorm(300,10,3),100,3)
xreg <- cbind(1000-0.75*xreg[,1]+1.75*xreg[,2]+
              sqrt(exp(0.3+0.5*xreg[,2]-0.4*xreg[,3]))*rnorm(100,0,1),xreg)
colnames(xreg) <- c("y",paste0("x",c(1:3)))

The scatterplot of the generated data will look like this:

spread(xreg)

Scatterplot matrix for the generated data

We can then fit a model, specifying the location and scale parts of it in alm(). In this case, the alm() will call for sm() and will estimate both parts via likelihood maximisation. To make things closer to forecasting task, we will withhold the last 10 observations for the test set:

ourModel <- alm(y~x1+x2+x3, scale=~x2+x3, xreg, subset=c(1:90), distribution="dnorm")

The returned model contains both parts. The scale part of the model can be accessed via ourModel$scale. It is an object of class "scale", supporting several methods, such as
actuals(), residuals(), fitted(), summary() and plot() (and several other). Here how the summary of the model looks in my case:

summary(ourModel)

Response variable: y
Distribution used in the estimation: Normal
Loss function used in estimation: likelihood
Coefficients:
             Estimate Std. Error Lower 2.5% Upper 97.5%  
(Intercept) 1000.2850     2.9698   994.3782   1006.1917 *
x1            -0.8350     0.1435    -1.1204     -0.5497 *
x2             1.8656     0.1714     1.5246      2.2065 *
x3            -0.0228     0.1776    -0.3761      0.3305  

Coefficients for scale:
            Estimate Std. Error Lower 2.5% Upper 97.5%  
(Intercept)   0.0436     0.7012    -1.3510      1.4382  
x2            0.4705     0.0413     0.3883      0.5527 *
x3           -0.3355     0.0487    -0.4324     -0.2385 *

Error standard deviation: 4.52
Sample size: 90
Number of estimated parameters: 7
Number of degrees of freedom: 83
Information criteria:
     AIC     AICc      BIC     BICc 
391.0191 392.3849 408.5177 411.5908

The summary above shows parameters for both parts of the model. They are not far from the ones used in the generation of the model, which indicates that the implemented model works as intended. The only issue here is that the standard errors in the location part of the model (first four coefficients) do not take the heteroscedasticity into account and thus are biased. The HAC standard errors are not yet implemented in alm()

As we see, the returned model contains both parts. The scale part of the model can be accessed via ourModel$scale. It is an object of class "scale", supporting several methods, such as
actuals(), residuals(), fitted(), summary() and plot() (and several other). Just to see the effect of scale model, here are the diagnostics plots for the original model (which returns the $\xi_t$ residuals) and for the scale model ($\epsilon_t$ residuals):

par(mfcol=c(1,2))
plot(ourModel, 5)
plot(ourModel, 5)

Diagnostics plots for sm

The Figure above shows squared residuals vs fitted values for the location (the plot on the left) and the scale (the plot on the right) models. The former is agnostic of the scale model and demonstrates that there is heteroscedasticity of residuals (the variance increases with the increase of the fitted values). The latter shows that the scale model managed to resolve the issue. While the LOWESS line demonstrates some non-linearity, the distribution of residuals conditional on fitted values looks random.

Finally, we can produce forecasts from such model, similarly to how it is done for any other model, estimated with alm():

ourForecast <- predict(ourModel,xreg[-c(1:90),],interval="pred")
plot(ourForecast)

Forecast from the model

In this case, the function will first predict the scale part of the model, then it will use the predicted variance and the covariance matrix of parameters to calculate the prediction intervals, shown in Figure above. Given the independence of location and scale parts of the model, the conditional expectation (point forecast) will not change if we drop the scale model. It is all about variance.

Finally, if you do not want to use alm() function, you can use lm() instead and then apply the sm():

lmModel <- lm(y~x1+x2+x3, as.data.frame(xreg), subset=c(1:90))
smModel <- sm(lmModel, formula=~x2+x3, xreg)

In this case, the sm() will assume that the error term follows Normal distribution, and we will end up with two models that are not connected with each other (e.g., the predict() method applied to lmModel will not use predictions from the smModel). Nonetheless, we could still use all the R methods discussed above for the analysis of the smModel.

As a final word, the scale model is a new feature. While it already works, there might be bugs in it. If you find any, please let me know by submitting an issue on Github.

P.S.

There is a danger that greybox package will be soon removed from CRAN together with other 88 packages (including my smooth and legion) because the nloptr package that it relies on has not passed some of new checks recently introduced by CRAN. This is beyond my control, and I do not have time or power to influence this, but if this happens, you might need to switch to the installation from GitHub via remotes package, using the command:

remotes::install_github("config-i1/greybox")

My apologies for the inconvenience. I might be able to remove the dependence on nloptr at some point, but it will not happen before March 2022.

Message Introducing scale model in greybox first appeared on Open Forecast.

An Integrated Method for Estimation and Optimisation

Ivan Svetunkov — Fri, 03 Sep 2021 15:47:15 +0000

My PhD student, Congzheng Liu (co-supervised with Adam Letchford) has written a paper, entitled “Newsvendor Problems: An Integrated Method for Estimation and Optimisation“. This paper has recently been published in EJOR. In this paper we build upon the existing Ban & Rudin (2019) approach for newsvendor problem, showing that in case of the linear model, it becomes equivalent to quantile regression. We then extend it for the non-linear newsvendor problems, testing it on simulated and real life data. In order to understand what specifically we propose, we need to discuss the typical process in case of newsvendor problem.

Newsvendor is a class of problems, where the product can only be sold one day, after which it goes to waste. So this is appropriate, for example, for perishable products in retail. Typically, in this situation we would have historical demand of sales of our product $y_t$ and we would try forecasting it using regression / ETS / ARIMA or any other model. After doing that and obtaining the estimates of parameters, we would produce a quantile of assumed distribution, which then tells us how much to order ($q_t$). If we order more than needed, we will have holding costs. In the opposite case, we will have shortage costs. Based on these costs and the price of product, we can find the optimal order, that will give the maximum profit.

As you can already spot, the forecasting stage is detached from the optimisation one in this situation. The idea of the proposed integrated approach (IMEO) is simple: instead of optimising the model via MSE or any other conventional loss and then solving the optimisation problem, we could estimate the model via maximisation of the specific profit function, thus obtaining the required orders directly. This is not a new idea on its own, but using profit function rather than the cost (as Ban & Rudin, 2019 did) allows applying IMEO to wider set of problems.

For example, if we know the price of the product $p$, the costs for production $v$, holding $c_h$ and shortage costs $c_s$, we can then calculate profit as (for a linear newsvendor problem):
\begin{equation}
\pi(q_t,y_t)=
\begin{cases}
p y_t -v q_t -c_h (q_t -y_t),& \text{for } q_t \geq y_t\\
p q_t -v q_t -c_s (y_t -q_t),& \text{for } q_t< y_t, \end{cases} \end{equation} where $q_t$ is the order quantity and $y_t$ is the actual sales. This profit function can be used for the estimation of a model of your choosing. Congzheng has written a separate R code for the experiments for the paper. Inspired by his example, I have implemented custom losses in alm() and adam() functions from respective greybox and smooth packages for R. At the moment, only the regression model works properly with custom losses – ETS / ARIMA need additional modifications, which we will hopefully resolve in the next paper. So, here is an example with linear newsvendor problem and alm():

# Generate artificial data
x1 <- rnorm(100,100,10)
x2 <- rbinom(100,2,0.05)
y <- 10 + 1.5*x1 + 5*x2 + rnorm(100,0,10)
ourData <- cbind(y=y,x1=x1,x2=x2)

# Define price and costs
price <- 50
costBasic <- 5
costShort <- 15
costHold <- 1

# Define profit function for the linear case
lossProfit <- function(actual, fitted, B, xreg){
    # Minus sign is needed here, because we need to minimise the loss
    profit <- -ifelse(actual >= fitted,
                     (price - costBasic) * fitted - costShort * (actual - fitted),
                     price * actual - costBasic * fitted - costHold * (fitted - actual));
    return(sum(profit));
}

# Estimate the model
model1 <- alm(y~x1+x2, ourData, loss=lossProfit)

# Print summary of the model
summary(model1, bootstrap=TRUE)

Response variable: y
Distribution used in the estimation: Normal
Loss function used in estimation: custom
Bootstrap was used for the estimation of uncertainty of parameters
Coefficients:
            Estimate Std. Error Lower 2.5% Upper 97.5%  
(Intercept)  36.5177    14.2840     2.7783     51.4844 *
x1            1.3622     0.1622     1.1909      1.7528 *
x2            3.3423     2.7810    -6.5997      5.9101  

Error standard deviation: 17.2266
Sample size: 100
Number of estimated parameters: 3
Number of degrees of freedom: 97

The resulting model is easy to work with: it provides meaningful parameters, showing how on average the order should change if a variable changes by one. For example, we see that with the increase of the variable x1, the orders should change on average by 1.36.

Note that in this specific case, as shown in our paper, the model would be equivalent to the quantile regression, estimated for the quantile $\left( \frac{c_u}{c_o+c_u} \right)$, where $c_u= p-v+c_s$ is the "underage" cost and $c_o = v+c_h$ is the "overage" cost. In our example it corresponds to approximately 0.9091 quantile. We can compare the output of this model with the one from the quantile regression in alm (which is estimated as an Asymmetric Laplace model):

model2 <- alm(y~x1+x2, ourData, distribution="dalaplace", alpha=0.9091)
summary(model2, bootstrap=TRUE)

Response variable: y
Distribution used in the estimation: Asymmetric Laplace with alpha=0.9091
Loss function used in estimation: likelihood
Bootstrap was used for the estimation of uncertainty of parameters
Coefficients:
            Estimate Std. Error Lower 2.5% Upper 97.5%  
(Intercept)  36.6688    11.6686     3.8674     51.1987 *
x1            1.3611     0.1338     1.1920      1.7454 *
x2            3.1259     2.5424    -6.2518      5.4703  

Error standard deviation: 17.3379
Sample size: 100
Number of estimated parameters: 4
Number of degrees of freedom: 96
Information criteria:
     AIC     AICc      BIC     BICc 
826.4622 826.8833 836.8829 837.8524

The differences between the estimates of parameters of the two models are due to the optimisation procedure, which would converge to slightly different points in these two cases. Still, the values of parameters are close to each other and would converge asymptotically, which supports our finding.

And here how the orders over time look in case of our custom loss:

plot(model1, 7)

Dynamics of orders from alm model

The purple line in the Figure above corresponds to the orders and would cover roughly 90.91% of cases, so that we would run out of product in approximately 10% of cases, which would still be more profitable than any other option.

Finally, the approach works also well in case of non-linear newsvendor problem (see the paper for details), where quantile regression is not suitable and the conventional approach fails. The only thing that would change is the loss function, where the prices and costs would depend non-linearly on the order quantity and sales.

You can read the published paper on EJOR website or the working paper on ResearchGate.

Message An Integrated Method for Estimation and Optimisation first appeared on Open Forecast.

The creation of ADAM – next step in statistical forecasting

Ivan Svetunkov — Wed, 13 Jan 2021 11:24:18 +0000

Good news everyone! The future of statistical forecasting is finally here :). Have you ever struggled with ETS and needed explanatory variables? Have you ever needed to unite ARIMA and ETS? Have you ever needed to deal with all those zeroes in the data? What about the data with multiple seasonalities? All of this and more can now be solved by adam() function from smooth v3.0.1 package for R (on its way to CRAN now). ADAM stands for “Augmented Dynamic Adaptive Model” (I will talk about it in the next CMAF Friday Forecasting Talk on 15th January). Now, what is ADAM? Well, something like this:

The Creation of ADAM by Arne Niklas Jansson with my adaptation

ADAM is the next step in time series analysis and forecasting. Remember exponential smoothing and functions like es() and ets()? Remember ARIMA and functions like arima(), ssarima(), msarima() etc? Remember your favourite linear regression function, e.g. lm(), glm() or alm()? Well, now these three models are implemented in a unified framework. Now you can have exponential smoothing with ARIMA elements and explanatory variables in one box: adam(). You can do ETS components and ARIMA orders selection, together with explanatory variables selection in one go. You can estimate ETS / ARIMA / regression using either likelihood of a selected distribution or using conventional losses like MSE, or even using your own custom loss. You can tune parameters of optimiser and experiment with initialisation and estimation of the model. The function can deal with multiple seasonalities and with intermittent data in one place. In fact, there are so many features that it is just easier to list the major of them:

ETS;
ARIMA;
Regression;
TVP regression;
Combination of (1), (2) and either (3), or (4);
Automatic selection / combination of states for ETS;
Automatic orders selection for ARIMA;
Variables selection for regression part;
Normal and non-normal distributions;
Automatic selection of most suitable distributions;
Advanced and custom loss functions;
Multiple seasonality;
Occurrence part of the model to handle zeroes in data (intermittent demand);
Model diagnostics using plot() and other methods;
Confidence intervals for parameters of models;
Automatic outliers detection;
Handling missing data;
Fine tuning of persistence vector (smoothing parameters);
Fine tuning of initial values of the state vector (e.g. level / trend / seasonality / ARIMA components / regression parameters);
Two initialisation options (optimal / backcasting);
Provided ARMA parameters;
Fine tuning of optimiser (select algorithm and convergence criteria);
…

All of this is based on the Single Source of Error state space model, which makes ETS, ARIMA and regression directly comparable via information criteria and opens a variety of modelling and forecasting possibilities. In addition, the code is much more efficient than the code of already existing smooth functions, so hopefully this will be a convenient function to use. I do not promise that everything will work 100% efficiently from scratch, because this is a new function, which implies that inevitably there are bugs and there is a room for improvement. But I intent to continue working on it, improving it further, based on the provided feedback (you can submit an issue on github if you have ideas).

Keep in mind that starting from smooth v3.0.0 I will not be introducing new features in es(), ssarima() and other conventional functions for univariate variables in smooth – I will only fix bugs in them and possibly optimise some parts of the code, but there will be no innovations in them, given that the main focus from now on will be on adam(). To that extent, I have removed some experimental and not fully developed parameters from those functions (e.g. occurrence, oesmodel, updateX, persistenceX and transitionX).

Now, I realise that ADAM is something completely new and contains just too much information to cover in one post. As a result, I have started the work on an online textbook. This is work in progress, missing some chapters, but it already covers many important elements of ADAM. If you find any mistakes in the text or formulae, please, use the “Open Review” functionality in the textbook to give me feedback or send me a message. This will be highly appreciated, because, working on this alone, I am sure that I have made plenty of mistakes and typos.

Example in R

Finally, it would be boring just to announce things and leave it like that. So, I’ve decided to come up with an R experiments on M, M3 and tourism competitions data, similar to how I’ve done it in 2017, just to show how the function compares with the other conventional ones, measuring their accuracy and computational time:

Huge chunk of code in R

# Load the packages. If the packages are not available, install them from CRAN
library(Mcomp)
library(Tcomp)
library(smooth)
library(forecast)

# Load the packages for parallel calculation
# This package is available for Linux and MacOS only
# Comment out this line if you work on Windows
library(doMC)

# Set up the cluster on all cores / threads.
## Note that the code that follows might take around 500Mb per thread,
## so the issue is not in the number of threads, but rather in the RAM availability
## If you do not have enough RAM,
## you might need to reduce the number of threads manually.
## But this should not be greater than the number of threads your processor can do.
registerDoMC(detectCores())

##### Alternatively, if you work on Windows (why?), uncomment and run the following lines
# library(doParallel)
# cl <- detectCores()
# registerDoParallel(cl)
#####

# Create a small but neat function that will return a vector of error measures
errorMeasuresFunction <- function(object, holdout, insample){
    return(c(measures(holdout, object$mean, insample),
             mean(holdout < object$upper & holdout > object$lower),
             mean(object$upper-object$lower)/mean(insample),
             pinball(holdout, object$upper, 0.975)/mean(insample),
             pinball(holdout, object$lower, 0.025)/mean(insample),
             sMIS(holdout, object$lower, object$upper, mean(insample),0.95),
             object$timeElapsed))
}

# Create the list of datasets
datasets <- c(M1,M3,tourism)
datasetLength <- length(datasets)
# Give names to competing forecasting methods
methodsNames <- c("ADAM-ETS(ZZZ)","ADAM-ETS(ZXZ)","ADAM-ARIMA",
                  "ETS(ZXZ)","ETSHyndman","AutoSSARIMA","AutoARIMA");
methodsNumber <- length(methodsNames);
# Run adam on one of time series from the competitions to get names of error measures
test <- adam(datasets[[125]]);
# The array with error measures for each method on each series.
## Here we calculate a lot of error measures, but we will use only few of them
testResults <- array(NA,c(methodsNumber,datasetLength,length(test$accuracy)+6),
                             dimnames=list(methodsNames, NULL,
                                           c(names(test$accuracy),
                                             "Coverage","Range",
                                             "pinballUpper","pinballLower","sMIS",
                                             "Time")));

#### ADAM(ZZZ) ####
j <- 1;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- adam(datasets[[i]],"ZZZ");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ADAM(ZXZ) ####
j <- 2;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- adam(datasets[[i]],"ZXZ");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ADAMARIMA ####
j <- 3;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- adam(datasets[[i]], "NNN",
                 order=list(ar=c(3,2),i=c(2,1),ma=c(3,2),select=TRUE));
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="pred");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ES(ZXZ) ####
j <- 4;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- es(datasets[[i]],"ZXZ");
    testForecast <- forecast(test, h=datasets[[i]]$h, interval="parametric");
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### ETS from forecast package ####
j <- 5;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- ets(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### AUTO SSARIMA ####
j <- 6;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="smooth") %dopar% {
    startTime <- Sys.time()
    test <- auto.ssarima(datasets[[i]]);
    testForecast <- forecast(test, h=datasets[[i]]$h, interval=TRUE);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

#### AUTOARIMA ####
j <- 7;
result <- foreach(i=1:datasetLength, .combine="cbind", .packages="forecast") %dopar% {
    startTime <- Sys.time()
    test <- auto.arima(datasets[[i]]$x);
    testForecast <- forecast(test, h=datasets[[i]]$h, level=95);
    testForecast$timeElapsed <- Sys.time() - startTime;
    return(errorMeasuresFunction(testForecast, datasets[[i]]$xx, datasets[[i]]$x));
}
testResults[j,,] <- t(result);

# If you work on Windows, don't forget to shutdown the cluster via the following command:
# stopCluster(cl)

After running this code, we will get the big array (7x5315x21), which would contain many different error measures for point forecasts and prediction intervals. We will not use all of them, but instead will extract MASE and RMSSE for point forecasts and Coverage, Range and sMIS for prediction intervals, together with computational time. Although it might be more informative to look at distributions of those variables, we will calculate mean and median values overall, just to get a feeling about the performance:

A much smaller chunk of code in R

round(apply(testResults[,,c("MASE","RMSSE","Coverage","Range","sMIS","Time")],
            c(1,3),mean),3)
round(apply(testResults[,,c("MASE","RMSSE","Range","MIS","Time")],
            c(1,3),median),3)

This will result in the following two tables (boldface shows the best performing functions):

Means:
               MASE RMSSE Coverage Range  sMIS  Time
ADAM-ETS(ZZZ) 2.415 2.098    0.888 1.398 2.437 0.654
ADAM-ETS(ZXZ) 2.250 1.961    0.895 1.225 2.092 0.497
ADAM-ARIMA    2.551 2.203    0.862 0.968 3.098 5.990
ETS(ZXZ)      2.279 1.977    0.862 1.372 2.490 1.128
ETSHyndman    2.263 1.970    0.882 1.200 2.258 0.404
AutoSSARIMA   2.482 2.134    0.801 0.780 3.335 1.700
AutoARIMA     2.303 1.989    0.834 0.805 3.013 1.385

Medians:
               MASE RMSSE Range  sMIS  Time
ADAM-ETS(ZZZ) 1.362 1.215 0.671 0.917 0.396
ADAM-ETS(ZXZ) 1.327 1.184 0.675 0.909 0.310
ADAM-ARIMA    1.476 1.300 0.769 1.006 3.525
ETS(ZXZ)      1.335 1.198 0.616 0.931 0.551
ETSHyndman    1.323 1.181 0.653 0.925 0.164
AutoSSARIMA   1.419 1.271 0.577 0.988 0.909
AutoARIMA     1.310 1.182 0.609 0.881 0.322

Some things to note from this:

ADAM ETS(ZXZ) is the most accurate model in terms of mean MASE and RMSSE, it has the coverage closest to 95% (although none of the models achieved the nominal value because of the fundamental underestimation of uncertainty) and has the lowest sMIS, implying that it did better than the other functions in terms of prediction intervals;
The ETS(ZZZ) did worse than ETS(ZXZ) because the latter considers the multiplicative trend, which sometimes becomes unstable, producing exploding trajectories;
ADAM ARIMA is not performing well yet, because of the implemented order selection algorithm and it was the slowest function of all. I plan to improve it in future releases of the function;
While ADAM ETS(ZXZ) did not beat ETS from forecast package in terms of computational time, it was faster than the other functions;
When it comes to medians, auto.arima(), ets() and auto.ssarima() seem to be doing better than ADAM, but not by a large margin.

In order to see if the performance of functions is statistically different, we run the RMCB test for MASE, RMSSE and MIS. Note that RMCB compares the median performance of functions. Here is the R code:

A smaller chunk of code in R for the MCB test

# Load the package with the function
library(greybox)
# Run it for each separate measure, automatically producing plots
rmcbResultMASE <- rmcb(t(testResults[,,"MASE"]))
rmcbResultRMSSE <- rmcb(t(testResults[,,"RMSSE"]))
rmcbResultsMIS <- rmcb(t(testResults[,,"sMIS"]))

And here are the figures that we get by running that code

RMCB test for MASE

RMCB test for RMSSE

As we can see from the two figures above, ADAM-ETS(Z,X,Z) performs better than the other functions, although statistically not different than ETS implemented in es() and ets() functions. ADAM-ARIMA is the worst performing function for the moment, as we have already noticed in the previous analysis. The ranking is similar for both MASE and RMSSE.

And here is the sMIS plot:

RMCB test for sMIS

When it comes to sMIS, the leader in terms of medians is auto.arima(), doing quite similar to ets(), but this is mainly because they have lower ranges, incidentally resulting in lower than needed coverage (as seen from the summary performance above). ADAM-ETS does similar to ets() and es() in this aspect (the intervals of the three intersect).

Obviously, we could provide more detailed analysis of performance of functions on different types of data and see, how they compare in each category, but the aim of this post is just to demonstrate how the new function works, I do not have intent to investigate this in detail.

Finally, I will present ADAM with several case studies in CMAF Friday Forecasting Talk on 15th January. If you are interested to hear more and have some questions, please register on MeetUp or via LinkedIn and join us online.

Message The creation of ADAM – next step in statistical forecasting first appeared on Open Forecast.

International Symposium on Forecasting 2019

Ivan Svetunkov — Wed, 03 Jul 2019 09:12:25 +0000

The ISF2019 took place in Thessaloniki, Greece. This time I presented a spin-off of my research on intermittent demand in retail, entitled as “What about those sweet melons? Using mixture models for demand forecasting in retail”. The idea is quite trivial and simple: use mixture distribution regressions (e.g. logistic and log-normal distributions) in order to predict the seasonally-intermittent sales in retail. The model is quite simple and easy to implement in practice. The main problem that I’ve faced so far is the absence of the proper data. I only had 24 series of weekly sales of tomatoes provided by a small company, but I need more in order to see, which of the approaches works best. For this research, I need the data like this:

Retail sales of tomatoes

Until I have the data, I cannot write a paper on that topic…

Anyway, here are the slides if anyone wants to have a look.

Message International Symposium on Forecasting 2019 first appeared on Open Forecast.