Making data on COVID-19 available to the scientific community

The team developed software to extract the color code for each municipality in the epidemiological reports.

During this corona crisis there is a high demand for statistics, graphs, and maps concerning the COVID-19 outbreak. The National Institute for Public Health and the Environment (RIVM) collects data and publishes statistics on COVID-19 infections in the Netherlands on a daily basis. Utrecht University’s Research Data Management Support spoke with research engineer Jonathan de Bruin about the data behind those graphs and his initiative, named CoronaWatchNL. The aim of this project is to organize the reported data from the RIVM, and make it available to the scientific community and the public.

Time series

In the Netherlands, the RIVM is responsible for the national coordination for infectious diseases. “Every day around 2 p.m., RIVM reports the latest statistics on the COVID-19 outbreak in the Netherlands” according to De Bruin. “The numbers are made available in their news releases, an interactive chart, and a daily epidemiological report. When these data are collected and combined, these daily updates can result in a time series.”

Sound data management is more important than ever, to give researchers the chance to build upon the results of others

“For the scientific community as well as the public, it is important to have a time series of the COVID-19 cases and fatalities. Researchers and policymakers use them for modeling, discussion, and awareness. Making these data findable, accessible, interoperable, and reusable (FAIR) is more important than ever, to give researchers the chance to build upon the results of others.”

Worldwide 

Universities, developers and volunteers worldwide are taking the lead in collecting, organizing and structuring the outbreak numbers on the COVID-19 outbreak on a global and national level. “One of the most prominent projects is the Coronavirus Resource Center of the Johns Hopkins University (JHU). JHU collects data on officially reported cases from around the world on a national level and provides an interactive dashboard to report and map the numbers. This is one of the most reliable sources of global outbreak numbers at this moment.” 

This project relies on the collaboration between our researchers and volunteers

COVID-19 infections by province during the past month

CoronaWatchNL 

Where the JHU stops, CoronaWatchNL continues. With this project, De Bruin aims to compose a FAIR dataset on RIVM reported numbers. He started collecting all numbers, datasets, and mutations reported by RIVM directly after the first numbers were published. CoronaWatchNL is now backed by researchers from Utrecht University and volunteers. “We use the same methods as the JHU, but we collect data on a regional level as well. We do have several structured datasets available on positively tested patients, their municipality of residence, ages, and sex.” 

“We update the data on Github and make a persistent data publication on Zenodo on a daily basis. With date-based versioning, we provide a clear overview of our datasets. These persistent data publications are important for the research community such that their work is reproducible.”  The RIVM added the project to their overview of data sources on COVID-19.

Open Science

De Bruin, who also leads the work package Open Software in the Utrecht University’s Open Science program, praises the contributions of the researchers and volunteers on CoronaWatchNL. “New datasets have been added by volunteers, software was developed to extract statistics from PDF's, and visualizations were made available. Open Science, Citizen Science, and making datasets FAIR make research on COVID-19 more effective, reliable and fast. This project relies on the collaboration between our researchers and volunteers.”