Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Data Science Community Reacts to COVID-19 Pandemic

Data Science Community Reacts to COVID-19 Pandemic

This item in japanese

The data-science, AI and machine-learning communities are publishing numerous data-oriented articles and blog posts on the COVID-19 pandemic in both industry and mainstream publications.

Content includes suggestions and grassroot efforts to provide access to data and utilize ML techniques to help deal with the crises, and several Kaggle competitions have been organized to pose challenges based on COVID-19 data.

The situation has also drawn attention to AI and ML-powered companies developing products or services in the space of pandemic predictions, medical diagnosis and treatment discovery, with some hoping to see these companies play an impactful role in the ongoing crisis. Naturally, research teams and labs in academia are also reacting to the crisis, with rapid publication of research papers on the topic.

It is not only tech companies and academia, however, that attempt to apply data science techniques to help the fight against the coronavirus. The US Centers of Disease Control (CDC) is working with researchers at the machine-learning department of Carnegie Mellon University to forecast the spread of coronavirus. The team built a machine-learning model that processes data from several sources such as flu-related Google searches, Twitter activity, and web traffic to predict the spread of the virus.

Significant efforts that are made by the scientific community as a whole also offer a unique opportunity to the data science community. One such example is the effort to create the COVID-19 Open Research Dataset (CORD-19), an extensive machine-readable collection of coronavirus literature available for data and text mining, with over 29,000 articles. Requested by the White House Office of Science and Technology Policy, the dataset was created by researchers from the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) at the National Institutes of Health. Such a dataset enables, among other opportunities, the use of various data mining, automated knowledge discovery and insight extraction techniques that might help the science community answer high-priority scientific questions related to COVID-19.

Amid the considerable coverage these efforts received in the tech media, some critical voices expressed concern over the scope of expectations and hype generated about the role the AI and data science communities can play in the COVID-19 crisis. It was pointed out that daily predictions by companies like BlueDot and Metabiota, specialising in infectious disease outbreak prediction, did not surpass those made by human experts, and have gotten significantly less accurate after the first two weeks. Furthermore, AI-powered efforts to automate and improve diagnosis and treatment discovery are both months and even years away from playing a significant role in these processes, due to the low amount of data and partial understanding of the disease. Finally, the astounding amounts of opinionated articles analysing the breakdown and the response to it by data scientists and ML practitioners with little to no background in epidemiology has generated a significant backlash by professionals, advocating for more responsible writing and avoiding baseless speculations and conclusions.

Rate this Article