Datahugger

Datahugger is a tool designed to simplify and automate the downloading of scientific datasets, software, and code from a wide range of repositories, using either their DOI (Digital Object Identifier) or direct URL. Developed in response to the growing emphasis on open science and research reproducibility, Datahugger addresses the common challenge of locating and reliably accessing research artifacts that are distributed across different platforms.

Progress

Researchers often encounter fragmented or inconsistent data-sharing practices, which can lead to difficulty reproducing results or integrating datasets into new studies. Datahugger was created to streamline this process—enabling researchers to programmatically retrieve relevant materials in a consistent and efficient manner. By offering both a straightforward Python interface and an intuitive Command Line Interface (CLI), it fits seamlessly into data pipelines and workflows, making it easier to build reproducible and transparent research projects.

Datahugger offers support for more than 377 generic and specific (scientific) repositories. See the documentation for more information! We are still expanding Datahugger with support for more repositories.

People involved