Labelled data for benchmarking

Photo by Bernd Klutsch on Unsplash

Although data sharing is part of the PRIMA checklist, actually sharing the data underlying a systematic review including all labeling decisions is not standard. Therefore, this project is devoted to the development of a new data-sharing protocol. The digital object identifier (DOI) of the studies found should be made available including the meta-data used for making the decisions (title/abstract). To reproduce the entire selection process, also all labeling decisions should be made available on each record throughout the process (title, abstract-, full-text inclusion). Only with this combined information, a review is completely reproducible.

However, simply publishing a dataset containing all meta-data might not be possible since the abstract is a creative work protected by copyright, and therefore third parties are not permitted to republish the abstract unless the license under which it is originally published permits such republication. Therefore, we developed a pipeline to process labeled datasets using transformed abstracts.

This research line also includes building and maintaining large-scale databases of pre-labeled data for use in simulation studies and benchmark testing. This includes acquiring and curating data, as well as developing methods for ensuring data quality and consistency. 

People involved