How to make your data FAIR

The FAIR data principles are guiding principles on how to make data Findable, Accessible, Interoperable and Reusable, formulated by Force11. On this website, the principles are explained and translated into practical information for Utrecht University researchers.

Why?

Science, at its core, is a discipline that builds upon the discoveries of its antecedents. The amount of progress we can make as an academic community is therefore intrinsically connected to the amount of information that we make available and reusable to others. As science entered into the digital age, the amount of data produced began to reach astronomical sizes.

FAIR principles

Thankfully, the same digital movement brought along digital platforms where the data could be stored and relayed. But how to use these digital platforms in an organised manner? The FAIR principles (Findable, Accessible, Interoperable, Reusable) are a useful framework for thinking about sharing data in a way that will enable maximum use and reuse.

Benefits for research(ers)

Making research data more FAIR will provide a range of benefits to researchers, research communities, research infrastructure facilities and research organisations alike, including:

  • Achieving maximum impact from research.
  • Increasing the visibility and citations of research.
  • Improving the reproducibility and reliability of research.
  • Attracting new partnerships with researchers, business, policy and broader communities.
  • Enabling new research questions to be answered.

When making data FAIR, metadata plays an important role. Why and how is explained here:

Metadata

If you have data, you have metadata. Metadata is essential to find, reuse and manage your data, and understand the context of your data and files. With metadata you describe who is the responsible researcher, when, where and why the data was collected, how the research data should be cited, etc. The content and format of metadata is often guided by a specific discipline and/or repository through the use of a metadata standard. Click on the following link for an example on metadata for visual data.

More information

How to make your research data FAIR?

There are different levels of making your data FAIR. Note that you may not always be able to adhere to all. But applying some of the principles to your data will add to the findability, accessibility, interoperability and reusability of your research data.

Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services.

Make your data findable by ensuring:

  • Data are described with rich metadata.
  • (Meta)data are assigned a globally unique and persistent identifier (for example a DOI).
  • (Meta)data are registered or indexed in a searchable resource.

Practical

  • Most (disciplinary) repositories will assign a persistent identifier when archiving a dataset.
  • Use a trusted and well-suited repository to publish your data. Utrecht University can assist you in using:

​​A persistent identifier will automatically be assigned to the metadata of your dataset when using these repositories. The metadata fields in Yoda and Dataverse are based on the Dublin Core and DataCite metadata standards.

  • There are other repositories that might also be well-suited for your data depending on the discipline and the common practices. You can use the repository decision tool for some guidance. Or check the registry for research data repositories (by Re3data).
  • The basic and pro storage solutions at Utrecht University cannot be used for FAIR data archiving as they do not assign a persistent identifier.
  • Register at ORCID for your personal persistent author identifier and use this identifier with all your (data) publications
  • Add rich metadata (describe the dataset’s context, quality, condition & characteristics). When publishing your data, think about the researchers who might want to use it. You want to make your data findable for those users. The more elaborate information about the context, content and characteristics of the data, the more findable it will be.
  • Reference the persistent identifiers in your research output.

More information

Accessible

It should be possible for humans and machines to gain access to your data, under specific conditions or restrictions where appropriate. FAIR does not necessarily mean that data need to be open! In cases where the data cannot be made openly accessible, it is still possible to make the metadata publicly available.

Make your data accessible by ensuring

  • The repository you are using to share your data assigns persistent identifiers by which data can be retrieved.
  • The access procedure includes authentication and authorisation steps, if necessary.
  • Metadata are accessible, wherever possible, even if the data are not.

Practical

  • Yoda and Dataverse can both be used to publish and grant access to your data, either publicly downloadable as open data or available upon request or with restricted access.
  • When the research is ongoing, Yoda allows you to give access to your data to external parties. Thus, in multi-institutional or multi-national projects using Yoda  facilitates data sharing and collaboration.
  • In case the access to the data is restricted, make sure to provide sufficient contact information for other researchers if they want to access the data (personal email that will be valid for a long time, contacts of the lab manager, etc.)
  • The format in which your data is stored also plays a role in accessibility. Open, non-proprietary or common formats will increase accessibility.
  • Think about software tools that are needed to access your data. If necessary, include documentation about the software (version, etc.). You might want to include the relevant software (e.g. in open source code).

More information

Interoperable

To speed up discovery and uncover new insights, research data should be easily combined with other datasets, applications and workflows by humans as well as computer systems.

Make your data interoperable by using

  • when possible, well-known and preferably open formats and software. 
  • relevant standards for metadata.
  • community agreed schemas, controlled vocabularies, keywords, thesauri or ontologies where possible.

Practical

  • Create a README file (.txt or .pdf) to help ensure that your data can be correctly interpreted and re-analysed by others. A README file should contain the following information:
    • for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication;
    • for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units;
    • any data processing steps, especially if not described in the publication, that may affect interpretation of results;
    • a description of what associated datasets are stored elsewhere, if applicable;
    • whom to contact with questions.
  • Attach the programming scripts you used to analyse or gather your data.
  • Adequately annotate programming scripts so that others can understand it. 
  • Use consistency in your file names, data variables, scripts, scripts variables and throughout similar annotations.

More information

Reusable

Research data should be ready for future research and future processing, making it self-evident that findings can be replicated and that new research effectively builds on already acquired, previous results.

Make your data reusable by ensuring the data

  • is well-documented to support proper data interpretation.
  • have a clear and accessible data usage license so others know what kinds of reuse are permitted.
  • has provenance information to make clear how, why and by whom the data have been created and processed.
  • (and metadata) meet relevant domain standards.

Practical

  • Documentation should be provided on three levels:
    • Project-level documentation explains the aims of the study, the hypothesis behind it, the instruments and the methodology.
    • File-level documentation explains how all the files that make up a data set relate to one another.
    • Item-level documentation explains the names of the variables and the meanings of those variables.
  • Provide your data with a clear license to govern the terms of its reuse. Commonly used licenses like Creative Commons (CC) or MIT can be linked to your data or software.
  • The guidelines under Horizon 2020 recommend CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. When using Yoda and Dataverse, a data usage license is part of the metadata.

More information