There are many different issues with capturing seasonality in time series. In this short post, I’d like to discuss one of the most annoying ones.
I’m talking about the seasonal pattern that shifts over time. What I mean is that, for example, instead of having the standard number of observations in the cycle (e.g., 24 hours in a day), in some cases you can have more or fewer of them. How is that possible?
One of these issues is the Daylight Saving Time (DST) change. The original idea of DST was to reduce energy consumption because daylight in summer is longer than in winter (there’s a nice and long article on Wikipedia about it). Because of this, many countries introduced a time shift: in spring, the clock is moved forward by one hour, while in autumn it goes back. This idea had a reasonable motivation at the beginning of the 20th century, but I personally think that as we’ve progressed as a society, it has lost its value. While this is already extremely annoying on its own, a bit unhealthy (several studies report an increased risk of heart attacks), and a torture for parents with small kids (the little ones don’t understand that it’s not 7am yet), it also introduces a modelling challenge: two days in the year do not have 24 hours. In spring, we have 23 hours, while in autumn we have 25. Standard classical forecasting approaches (such as ETS/ARIMA, regression, STL or classical decomposition) break in this case, because by default they assume that a specific pattern repeats itself every 24 hours. The issue arises because business cycles are tuned to working hours, not to the movement of the sun – people come to work at 9am, no matter how many hours are in the day.
Another challenge is leap years. While DST is totally man-made, leap years occur because the Earth orbits the sun approximately every 365.25 days. To avoid drifting too far from reality, our calendars include one extra day every four years (29th February). This addresses the issue but also means that one year has 366 days instead of 365. Once again, conventional models relying on fixed periodicity fail.
There are several ways to handle this, all with their own advantages and disadvantages:
- Fix the data. In the case of DST, this means removing one of the duplicated hours during the autumn time change and adding one during the spring shift. For leap years, it means dropping the 29th of February. This is easy to do, but breaks the structure and might cause issues when we have DST/leap year in the holdout sample.
- Introduce more complex components, such as Fourier-based ones, to capture the shift in the data. This works well for leap years but doesn’t address the DST issue. Harmonic regressions and TBATS do this, for example.
- Shift seasonal indices when the issue happens – for example, having two indices for 1am when the switch to winter time occurs.
In R, I’ve developed the temporaldummy()
function in the greybox
package to introduce correct dummy variables for data with shifting seasonality, and I’ve incorporated method (3) into the adam()
function from the smooth package. You can read more about these here: https://openforecast.org/adam/MultipleFrequenciesDSTandLeap.html
Are there any other strategies? Which one do you prefer?
BTW, Kandrika Pritularga and I are running a course on Demand Forecasting Principles with Examples in R. We’ll discuss some of these aspects there. Read more about it here.