12.4 Dealing with daylight saving and leap years
One of the problems that arises in the case of data with high frequency is the change of local time due to daylight saving (DST). This happens in some countries two times a year: in spring, the time is moved one hour forward (typically at 1 a.m. to 2 a.m.), while in the autumn, it is moved back one hour. The implications of this are terrifying from a forecasting point of view because one day of the year has 23 hours, while the other has 25 hours. This leads to modelling difficulties because all the business processes are typically aligned with the local time. This means that if the conventional seasonal ETS model with \(m=24\) fits the data, it will only work correctly in half of the year. If the smoothing parameter \(\gamma\) is high enough then after the DST change, the model will start updating the states and eventually will adapt to the new patterns, but this implies that \(\gamma\) will be higher than needed, introducing unnecessary reactivity in the model and thus wider prediction intervals.
There are two solutions to this problem:
- Shift the periodicity for one day, when the time changes from 24 to either 23, or 25, depending on the time of year;
- Introduce categorical variables for factors, which will mark specific hours of the day.
The first option is more challenging to formalise mathematically and implement in software, but does not require estimation of additional parameters – we only need to change the seasonality lag from 24 to either 23 or 25 for a specific day depending on the specific time change. This approach for seasonal ETS is implemented in adam()
if the data has appropriate timestamps and is framed as a zoo
object or something similar. The second option relies on the already discussed mechanism of ETSX{D} with categorical variables (Section 10.5) and is in general simpler. Given the connection between seasonality in the conventional ETS model and the ETSX{D} with categorical variables for seasonality, both approaches should be equivalent in terms of final forecasts.
The second problem in the high frequency data is the leap years. It can also be solved shifting the periodicity from \(m=365\) to \(m=366\) on 29th February in the spirit of option (1) or using the categorical variables approach (2). There is a difference, however: the latter assumes the estimation of an additional parameter, while the former would be suitable for the data with only one leap year in the data, where the estimation of the seasonal index for 29th February might be difficult. However, given the discussion in Section 12.3, maybe we should not bother with \(m=365\) in the first place and rethink the problem, if possible. Having 52/53 weeks in a year has similar difficulties but at least does not involve the estimation of so many initial seasonal states.
Alternatively, De Livera (2010) proposed to tackle the problem of leap years, introducing the fractional seasonality via Fourier series. The model that implements this is called TBATS (it is an Exponential Smoothing state space model with Box-Cox transformation, ARMA errors, Trend, and Seasonal components, De Livera et al., 2011). While this resolves the aforementioned problem with leap years, the approach introduces an additional complexity, because now we need to select the number of harmonics to use, which in general is not straightforward.
Summarising, when trying to resolve the problem with DST and leap years, there are several possible solutions, each one of them having advantages and disadvantages. In order to decide which to use in the end, it makes sense to try out several of them and select the one that works better (e.g. produces lower forecast errors).