13.3 Dealing with daylight saving and leap years

This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

Another problem that arises in case of data with high frequency is the change of local time due to daylight saving (DST). This happens in some countries two times a year: in Spring the time is moved one hour forward (typically at 1am to 2am), while in the Autumn it is moved back one hour. The implications of this are terrifying from forecasting point of view, because one day of year has 23 hours, while the other one has 25 hours, while all the business processes are aligned to the local time. This means that if the conventional seasonal ETS model with \(m=24\) is fit to the data, it will only work correctly in a half of year. Well, it will adapt to the new patterns after some times, but this implies that the smoothing parameter \(\gamma\) will be higher than needed.

There are two solutions to this problem: 1. Shift the periodicity for one day, when the time changes from 24 to either 23, or 25, depending on the time of year; 2. Introduce categorical variables for factors, which will mark specific hours of day;

The former is more difficult to formalise mathematically and implement in software, but the latter relies on the already discussed mechanism of ETSX{D} with categorical variables and should be more straightforward. Given the connection between seasonality in the conventional ETS model and the ETSX{D} with categorical variables for seasonality, both approaches should be equivalent in terms of parameters estimation and final forecasts.

Similarly, the problem with leap years can be solved either using the shift from \(m=365\) to \(m=366\) on 29th February in a spirit of the option (1), or using the categorical variables, approach (2). There is a difference, however: the former would be suitable for the data with only one leap year, where the estimation of the seasonal index for 29th February might be difficult, while the latter assumes the separate estimation of the parameter (so it has one more parameter to estimate). However, given the discussion in the previous section, maybe we should not bother with \(m=365\) in the first place and rethink the problem, if possible. Having 52 / 53 weeks in a year has similar difficulties, but at least does not involve the estimation of so many initial seasonal states.