Publishing and sharing data

This reading guide accompanies you in choosing the appropriate route for publishing and sharing your data.

 

1. Publishing and sharing data in a data repository

If you want to make your data (openly) available to the general public and make it findable, accessible, citable and/or to comply with funder requirements, you can consider publishing it in a public data repository. It is handy if in your chosen repository:
 

  • A persistent identifier is givena permanent link which points to the data, making your data findable and citable;
  • A license is given or can be chosen, creating clarity and certainty for potential users of your data.  

In some repositories, its is possible to restrict access to the data itself. Beware, this will be an extra hurdle for potential reusers. If your data is (privacy) sensitive, it is best that you do not publish the data in a public repository. For very sensitive data, not even under restricted access. You can publish its description though. Add an explanation how, and under what circumstances, the data itself is shared. 

Choosing a data repository

A wide variety of data repositories exists. Most have the option to publish your dataset using a persistent identifier and some provide the service of long-term preservation. Some repositories host data from various disciplines and others are domain or discipline specific. When choosing a repository for your data, be sure to check if the repository meets your criteria or the criteria set by your funder or journal editors.

Criteria to select a certain repository can be:

  • Is long-term preservation guaranteed or not?
    Some repositories will guarantee the legibility of the data, even if the hardware and software become obsolete.
  • What are the costs per dataset or gigabyte?
    Repositories differ in their cost model, some allow free deposits up to a certain amount of storage.
  • What is the physical storage location of data?
    The location of your data determines under which data protection law it falls. Some repositories store data in the US and others in the EU.
  • What is the default license?
    Some repositories allow for open or restricted access, or you can specify which license for use you want for your data.
  • Is the repository certificed? 
    Repositories with a Data Seal of Approval or CoreTrustSeal are recognised in the community as a trustworthy source of data.

WHERE TO PRESERVE AND SHARE YOUR RESEARCH DATA

Whatever public repository you choose, ensure that it has a quality mark, such as a certificate, and that the data recieves a persistent identifier for permanent referral to the data.

Some options: 

  • Do you wish to keep your data available permanently? The online archiving system EASY offered by DANS is a good option for data in the social sciences and humanities. For more technical data, geo spatial data and engineering data, 4TU.ResearchData is a good option.
  • Do you, as group or project, wish to keep your data together, manage it and keep complete control on who has access to it? Opt for DataverseNL. A Dataverse can be applied for through the library.
  • There are discipline-specific and even datatype-specific repositories to upload your data to. Re3data.org helps you to find a suitable repository. You can search for repositories that provide a 'persistent identifier' to the data. You can also search for data repositories which are certified. 

Other well-known and more generic repositories: 

  • B2Share - for European scientists and researchers to store and share small-scale research data
  • Zenodo – a repository that enables researchers, scientists, EU projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of the existing institutional or subject-based repositories of the research communities. It offers a variety of different licenses and access levels, integrates with GitHub with storage up to 50 GB per dataset (funded by EU and CERN);
  • Dryad – a curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable and citable. Dryad has integrated data submission for a growing list of journals;
  • Open Science Framework (OSF) - a scholarly commons to connect the entire research cycle. It is part network of research materials, part version control system, and part collaboration software;
  • Figshare – a repository that allows researchers to publish all of their research outputs in an easily citable, sharable and discoverable manner.

 

DECISION TOOL

Research Data Management Support has developed a beta version of a decision aid to help you choose the data repository that fits your data's needs most. The tool includes (DANS) EASY, 4TU.ResearchData, DataverseNL, B2Share, Open Science Framework, Zenodo, Dryad and Figshare.

Citing data with a persistent identifier

One very important piece of metadata when your data goes public, is how to cite your data. The advantage of a persistent identifier (PID) over a normal web address is that the PID always points to the data, even if the data itself has changed location.

Several types of PID exist, such as DOI, Handle, URN, ARk, PURL, etc. It doesn't matter which one you use for your data citation, although DOI is currently the most integrated in automatic citation counting algorithms.

Citations to your data can add to your academic impact. Indicate in your (Creative Commons) license or user agreement that you want your data cited when reused. 

Data citations work just like book or journal article citations and can include the following information:

  • Author;
  • Year;
  • Dataset title;
  • Repository;
  • Version;
  • Persistent IDentifier (PID), preceded by the URL. 

Examples

  • Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855.
  • Fraaije, Rob, 2017, "Replication Data for: Spatial Patterns of Water-dispersed Seed Deposition along Stream Riparian Gradients", hdl:10411/UWAU3K, DataverseNL Dataverse, V2.

Tips

  • Tip1: Get a PID at the data repository of your choice.
  • Tip2: Is your PID a DOI and do you want to cite it in the format of a specific journal? Use the DOI formatter from CrossCite.

Licensing data

In order to publish and share your data and meet the R of resuable in FAIR data management, you require a license. A license states what a user is allowed to do with your data and creates clarity and certainty for potential users.

If you deposit your data in a public data repository, you will be guided in choosing the appropriate license for your data, A license is not an option for all data; some of it may be too confidential or privacy-sensitive to be published. 

Creative Commons licenses

Licenses such as Creative Commons (CC) replace 'all rights reserved' copyright with 'some rights reserved'. There are seven standard CC-licenses. CC-BY is the most commonly used license, in which attribution is mandatory when using data. You can also choose restrictions like non-commercial, no derivatives, or share alike. 

The license you are allowed to apply may be determined or limited by the data repository of your choice. Licenses are static and do not change with the quick developments in the field of research data. Therefore, some data repositories work with a CC0 license whereby no rights are reserved. Instructions regarding use are completed with codes of conduct, which may be adapted more easily.

Choosing a license

Some tools: 

2. Publishing in a data journal 

Consider to publish your dataset in a peer-reviewed data journal. Data journals are publications whose primary purpose is to expose data sets. They enable you as an author to focus on the data itself, rather than producing an extensive analysis of the data which occurs in the traditional journal model. Typically, a publication in a data journal consists of an abstract, introduction, data description with methods and materials, short conclusion on reuse opportunities.

Fundamentally, data journals seek to:

  • promote scientific accreditation and reuse;
  • improve transparency of scientific methods and results;
  • support good data management practices;
  • provide an accessible and permanent route to the dataset.
The benefits of publishing in a data journal

Publishing in a data journal may be of interest to researchers and data producers for whom data is a primary research output. In some cases, the publication cycle may be quicker than that of traditional journals, and where there is a requirement to deposit data in an approved repository, long-term curation and access to the data is assured.

Publishing a data paper may be regarded as best practice in data management as it:

  • includes an element of peer review of the dataset;
  • maximises opportunities for reuse of the dataset;
  • provides academic accreditation for data scientists as well as for front-line researchers.

(source: ANDS Guide)

Examples of general and disciplinary data journals

There are general and disciplinary data journals. 

Examples of generic data journals:

Examples of disciplinary data journals: