You probably have already noticed that we are in a pandemic of COVID-19 these days (breaking news: the UK has just announced a lockdown due to the virus). The number of news, memes and noise on the topic coming from around the world is astonishing! What is also astonishing is the number of posts on data analysis and forecasting about the pandemic. Many data scientists, statisticians, machine learning experts, and even people who don’t know much about data analysis, feel that they have to make a difference. All of a sudden they have become experts in forecasting epidemics. They tell you how many cases of COVID-19 we will have next week, they predict how many people will die, they forecast how many COVID-19 cases will be reported in the US by the 31 March 2021 and so on, and so forth. These experts use Simulations, Exponential Smoothing, ARIMA, Bayesian methods, Neural Networks, judgment and whatever else they know in order to predict these things (I’m not giving links here – use Google if you want to find them). My head aches because of all this noise around us, and I don’t think that what these people do is helpful at all. And here is why.
Why?
First, without the fundamental background in epidemiology, such forecasts are often just exercises in fitting models. Some of these experts think that they can filter out the noise, getting a structure, or construct a fancy model that describes something and that this is it. But all of that should be done, given the theory, given the domain knowledge. One cannot just fit an ARIMA model to the data, produce forecasts and claim that they have done something reasonable or useful. Without the understanding of the problem, this becomes an exercise of using R / Python. This does not make anyone an expert in the area, nor does it mean that they have done something sensible. To add to this point, Rob Hyndman has recently summarised reasons why time series models are not really useful in this context, but in my opinion, the problem is wider and can be applied to many other analytical tools as well if they are used without proper expertise.
Second, we don’t really know the real situation at the moment. The data we use is probably incorrect and incomplete (again, see Rob’s post for discussion on this). For example, the countries seem to stop testing people these days in order not to spread the virus further, but even when they test, there is no proper way of saying how many people really have the virus. It looks like for the majority of the population it passes without big issues, it is the smaller proportion that is seen on the surface. And if we construct models using such data, then the conclusions we make would inevitably be incorrect and incomplete as well. This is unless we use proper models and have the necessary domain knowledge (go back to point one of this post)…
Third, all these analyses and forecasts do not help in decision making, they are done just out of curiosity, without any specific purposes. For instance, an expert predicted that we would have a total between 53 and 530 million cases of COVID-19 worldwide reported by 31 March 2021. So what? What do we do with that? Does this help decision makers? No. Does this help people in understanding what they should do? Again, no. This is just a forecast for the sake of forecasting. The COVID-19 topic is a current hype in analytics, and one can get attention and potentially scientific publications if they do anything in this direction. But the contribution of such analytics / forecasting exercises to science and society at large is limited (if at all useful).
What can we do?
Instead of producing “actionless” forecasts, we should focus on what we expect to happen with the society and economy. The lockdowns and self-isolation are hurting the economy, but are inevitable: this is a trade-off between public health and public prosperity. Some questions to consider:
- Can we predict how the virus will spread in different scenarios (early lockdown / late lockdown / no lockdown)?
- Can we predict what will happen with business during the lockdown period?
- How many bankruptcies will we see?
- What types of companies will go down first?
- How will this impact the prices on products?
- How many people will lose their jobs because of the hit on the economy?
All these are important questions that can help decide what we should do now, and how we should support business and society. Most probably, we cannot use time series methods and simple data analysis in order to answer these questions for the reasons outlined above. So, we either need to use domain specific models or judgmental methods, but this needs to be done with the support from the experts in the area.
Another interesting example is the observed panic buying, which has already damaged supply chains. People have suddenly started buying on average 200% more than they usually buy. We can phrase a few important operational research-related questions:
- How will the supply chain react?
- What are the effects of panic buying in long term?
- When will this stop and when will demand normalise again?
- What will happen with demand after the end of panic?
These are also important questions that help making decisions here and now for different groups of people. These are not simple questions to answer, they assume some effort, but at least the answers to them are useful.
Summary
I will not give you any predictions about COVID-19, because I am not an expert in the area. But, I can tell you that there is too much hype, noise and panic on the topic. We (forecasters, statisticians, data scientists, etc.) need to help society, not create even more noise and panic. So if you want to analyse something or create forecasts of something related to COVID-19, make sure that your results will help other people. If they won’t, then don’t!