This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

10.5 Examples of application

Building upon the example with AirPassengers data from the previous section, we will construct multiplicative ARIMA models and see, which one of them is the most appropriate for the data. As a reminder, the best additive ARIMA model was SARIMA(0,2,2)(0,2,2)\(_{12}\), which had AICc of 1029.975. We will do something similar here, but using Log Normal distribution, thus working with logARIMA. In order to understand what model can be used in this case, we can take logarithm of data and see whaat happens with the components of time series:

We still have the trend in the data and the seasonality now corresponds to the additive one rather than the multiplicative (as expected). While we might still need the second differences for the non-seasonal part of the model, taking first differences for the seasonal should suffice. So we can test several models with different options for ARIMA orders:

The thing that is different between the models is the non-seasonal part. Using the connection with ETS, the first model should work on local level data, the second should be optimal for the local trend series and the third one is placed somewhere in between the two. We can compare the models based on AICc:

## logSARIMA(0,1,1)(0,1,1)[12] logSARIMA(0,2,2)(0,1,1)[12] 
##                    982.5764                   1096.8616 
## logSARIMA(1,1,2)(0,1,1)[12] 
##                    993.0431

It looks like the logSARIMA(0,1,1)(0,1,1)\(_{12}\) is more appropriate for the data. In order to make sure that we did not miss anything, we analyse the residuals of this model:

We can see that there are no significant coefficient on either ACF or PACF, so there is nothing else to improve in this model. We can then produce forecast from the model and see how it performed on the holdout sample:

## Time elapsed: 0.45 seconds
## Model estimated using adam() function: SARIMA(0,1,1)[1](0,1,1)[12]
## Distribution assumed in the model: Log Normal
## Loss function type: likelihood; Loss function value: 472.923
## ARMA parameters of the model:
## MA:
##  theta1[1] theta1[12] 
##    -0.2785    -0.5530 
## 
## Sample size: 132
## Number of estimated parameters: 16
## Number of degrees of freedom: 116
## Information criteria:
##       AIC      AICc       BIC      BICc 
##  977.8460  982.5764 1023.9708 1035.5197 
## 
## Forecast errors:
## ME: -12.968; MAE: 13.971; RMSE: 19.143
## sCE: -59.285%; sMAE: 5.322%; sMSE: 0.532%
## MASE: 0.58; RMSSE: 0.611; rMAE: 0.184; rRMSE: 0.186

The ETS model closest to the logSARIMA(0,1,1)(0,1,1)\(_{12}\) would probably be ETS(M,M,M):

## Time elapsed: 0.16 seconds
## Model estimated using adam() function: ETS(MMM)
## Distribution assumed in the model: Inverse Gaussian
## Loss function type: likelihood; Loss function value: 468.5996
## Persistence vector g:
##  alpha   beta  gamma 
## 0.7723 0.0176 0.0001 
## 
## Sample size: 132
## Number of estimated parameters: 17
## Number of degrees of freedom: 115
## Information criteria:
##       AIC      AICc       BIC      BICc 
##  971.1992  976.5676 1020.2068 1033.3133 
## 
## Forecast errors:
## ME: -4.422; MAE: 15.626; RMSE: 21.726
## sCE: -20.217%; sMAE: 5.953%; sMSE: 0.685%
## MASE: 0.649; RMSSE: 0.693; rMAE: 0.206; rRMSE: 0.211

Comparing information criteria, ETS(M,M,M) should be preferred to logARIMA, but in terms of accuracy on the holdout, logARIMA is more accurate than ETS on this data.