SYNERGY

SYNERGY is a free and open dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews. Only 2,834 (1.67%) of the academic works in the binary classified dataset are included in the systematic reviews. This makes the SYNERGY dataset a unique dataset for the development of information retrieval algorithms, especially for sparse labels. Due to the many available variables available per record (i.e. titles, abstracts, authors, references, topics), this dataset is useful for researchers in NLP, machine learning, network analysis, and more. In total, the dataset contains 82,668,134 trainable data points.

Progress

The first version of SYNERGY was published on March 24th 2023 and has been widely used. As of April 2025, the dataset has been downloaded over 3 million times!

Currently we are developing a newer version called SYNERGY Plus. In this version we aim to increase the number of reviews from 26 to over 100. Additionally we are gathering more metadata, like the eligibility criteria, labels during title/abstract screening and various indicators of the quality of the review. We aim to release this version and a data paper describing the various additions to the dataset in 2025.

Funding

This project is funded as part of prof. dr. Rens van de Schoot’s VICI project, titled Transparent and Reproducible AI-aided systematic reviewing for the Social Sciences (TRASS), funded by the Dutch Research Council (VI.C.231.102).

People involved