Publishing and sharing data
At Utrecht University we aim to make data publicly available whenever possible. To maximise the visibility and re-useability of data; we encourage researchers to follow the FAIR principles.
Publishing data in a repository
If you want to make your data (openly) available to the general public and make it FAIR (findable, accessible, interoperable, reusable), citable and comply with funder requirements, you can consider publishing it in a public data repository.
RDM Support has created the Data Repository Finder to help you choose a generic data repository that fits your needs when sharing and publishing your research data.
Whatever public repository you choose, ensure that it has a quality mark, such as a certificate, and that the data receives a persistent identifier for permanent referral to the data. This makes the data findable to others.
Criteria to select a certain repository can be:
- a persistent identifier is given. This is a permanent link which points to the data, making your data findable and citable;
- Is long-term preservation guaranteed or not?
Some repositories will guarantee the legibility of the data, even if the hardware and software become obsolete. - What are the costs per dataset or gigabyte?
Repositories differ in their cost model, some allow free deposits up to a certain amount of storage. - What is the physical storage location of data?
The location of your data determines under which data protection law it falls. Some repositories store data in the US and others in the EU. - Does the repository allow for restricted access?
Some repositories allow for open or restricted access. Restricted access is often necessary when publishing pseudonymized personal data or when you wish to set special requirement for re-use for other reasons. - Does the repository allow me to choose the re-use licence?
Most repositories let you choose a licence. Make sure to change the default licence given by the repository in case you find it too permissive or too restrictive. When choosing restricted access, you will often write a custom licence, such as a data transfer agreement when sharing personal data. - Is the repository certified?
Repositories with a Data Seal of Approval or CoreTrustSeal are recognised in the community as a trustworthy source of data.
Some options:
- DataverseNL is also available to researchers at Utrecht University (UU) for publishing and preserving data. It is hosted by DANS but managed by the UU. RDM Support data consultants will review your dataset prior to publication. Contact info.rdm@uu.nl for any questions on DataverseNL.
- Yoda is a data storage solution that incorporates archiving (vaulting) and publishing. It is both hosted and managed by the UU. You can also ask RDM Support consultants to review your datasets within Yoda prior to publishing and archival. Contact info.rdm@uu.nl for any questions on Yoda.
- Do you wish to keep your data available permanently? The online archiving platform DANS is a good option for data in the social sciences and humanities. Currently there are DANS “data stations” for Social Sciences and Humanities, Archaeology, Life, Health and Medical Sciences and Physical and Technical Sciences.
- Technical data such as geo spatial data and engineering data can also be submitted to 4TU.ResearchData.
- Zenodo is a recommended generic public repository hosted by CERN.
- The Open Science Framework is a collaboration and storage platform that also enables publication of datasets.
- There are discipline-specific and even datatype-specific repositories where you can upload your data. Re3data.org helps you to find a suitable repository. You can search for certified repositories that provide a 'persistent identifier' for specific discipline.
When you have more specific needs for sharing and publishing your research data, you may want to consult RDM Support or check the overview of all data repositories available in the Registry of Research Data Repositories (Re3data.org).
Once you have selected a repository, you will likely be asked to fill in descriptive metadata pertaining to the dataset you wish to submit. Descriptive metadata encompasses generic information about the dataset.
Generally, you will be asked to fill in:
- a title for your dataset;
- a description of your project. This is often the abstract of your project;
- the authors full names and ORCID-ID;
- the affiliations of the authors (i.e. Utrecht University);
- keywords related to your dataset;
- the subject or discipline that best describes the topic of your dataset;
- DOI of related publication(s), if any.
Filling in this information will often ensure that you meet the basic requirements of Findability. These fields often conform with what are called generic metadata standards (i.e. Datacite 4.0). Keep in mind that field specific repositories often ask for discipline specific metadata fields to be filled in. These may in turn conform with discipline specific metadata standards.
Prior to uploading your data you must first ensure that your data is stored and organized in an Interoperable and Re-usable manner.
You can maximise interoperability and re-usability by:
- Using data formats which are open, commonly used and/or supported by software that does not require a licence. (i.e. .csv file instead of an Excel .xls file)
- Using filename conventions and folder structures that are easy to read and navigate.
- Provide a README file detailing information about your dataset.
- Provide adequate documentation pertaining to your dataset; this includes codebooks describing variables, acronyms and other forms of data level metadata.
You may always contact RDM support to help you check your dataset prior to publication. When using DataverseNL, an RDM support data specialist will automatically look over your data prior to publication.
When using Yoda as your repository, you may or may not have a dedicated data manager that reviews your dataset prior to publication. If you do not have pre-determined data manager you may simply contact RDM support to have one check your dataset prior to publication or archiving.
Publishing a dataset does not necessarily entail making the data publicly available. It means that you make the data findable and that you store the data in a repository where it may be retrieved. The retrieval of said data may be done openly (publicly) or under restricted access (available with permission). In cases where the data is too sensitive for sharing, the data may also not be accessible at all (Closed).
Open Data
Open access data means that anyone, anywhere, may find and download your data. What they can and cannot do, and how they can use your data, will then be dictated solely by the licence. Open data access is commonly used when the data is publicly funded and contains no personal data nor any confidential information.
Restricted Data
Data can also be published under restricted access. In this case the dataset is still findable but in order to download the data you must first ask the author(s) of the data for permission. Permission is often requested and facilitated by the repository but in some cases you must contact the researchers personally by email. Upon contact, the author(s) of the dataset will likely make you sign an agreement dictating what can and cannot be done with the data. This is a stronger way of protecting your data as opposed to using a simple licence.
It is recommended to use restricted access when publishing datasets with personal data that is not sufficiently de-identified or when publishing data with confidential information. Feel free to contact info.rdm@uu.nl or your privacy officer if you'd like some help determining if your dataset ought to be under restricted access.
Closed Data
When the dataset contains incredibly personal or confidential information that is not fit for sharing, the dataset should be published under closed access. This will ensure others can find information about the dataset via its metadata and also be aware that the data in its raw form is not available and for what reasons. In these cases authors may still be contacted to arrange for special forms of access such as chaperoned physical visits, sending the analysis code to the authors or obtaining a synthetic copy of the original dataset. If you want information about data privacy, check the Data Privacy Handbook.
In order to share your data and make it reusable, you ought to give it a licence. A licence states what a user is allowed to do with your data and creates clarity for potential users.
If you deposit your data in a public data repository, you will be guided in choosing the appropriate licence for your data. A licence is not an option for all data; some of it may be too confidential or privacy-sensitive to be published publicly.
Creative Commons licences
Licences such as Creative Commons (CC) replace 'all rights reserved' copyright with 'some rights reserved'. These licences are best suited for Open Data datasets. There are seven standard CC-licences. CC-BY is the most commonly used licence, in which attribution is mandatory when using data. You can also choose restrictions like non-commercial, no derivatives, or share alike.
Check out the Creative Commons licence selector.
The licence you are allowed to apply may be determined or limited by the data repository of your choice. Some data repositories work with a CC0 licence whereby no rights are reserved. Instructions regarding use are completed with codes of conduct, which may be adapted more easily.
Custom licences
When dealing with personal data, confidential data or data protected by intellectual property rights, creative common licences may not be appropriate. Instead, your dataset will be better protected by using a custom licence.
In a custom licence you get to precisely determine how your data may be used and what ought not to be done with the data (i.e. do not attempt to identify individuals). Custom licences are not easy to write up on your own unless you have previous experience. Luckily we provide some templates and guidance on custom licences in our Data Privacy Handbook: Agreements. You may always contact RDM support or your faculty privacy officer to help you with this matter.
Publishing in a data journal
You may also consider publishing your dataset in a peer-reviewed data journal. Data journals are publications whose primary purpose is to expose data sets. They enable you as an author to focus on the data itself, rather than producing an extensive analysis of the data which occurs in the traditional journal model. Typically, a publication in a data journal consists of an abstract, introduction, data description with methods and materials and a short conclusion on reuse opportunities.
Fundamentally, data journals seek to:
- provide an accessible and permanent route to the dataset;
- provide a detailed description of the methods and analysis used;
- explain vocabulary and standards used in the dataset;
- describe potential uses for the data;
- promote scientific accreditation and reuse.
Publishing in a data journal may be of interest to researchers and data producers for whom data is a primary research output. In some cases, the publication cycle may be quicker than that of traditional journals, and where there is a requirement to deposit data in an approved repository, long-term curation and access to the data is assured.
Publishing a data paper may be regarded as best practice in data management as it:
- includes an element of peer review of the dataset;
- maximises opportunities for reuse of the dataset;
- provides academic accreditation for data scientists as well as for front-line researchers.
There are general and disciplinary data journals. Examples of generic data journals:
Examples of disciplinary data journals:
Do you need support or assistance? Please contact RDM Support. We are here to help you.