Data description in practice

In this guide some specific examples and tips are given on how to document and describe your data with Excel.  

CONTENT

  1. Introduction
  2. Building metadata sheets with Excel
  3. Tips

1. Introduction

Documentation (human readable) and more specifically metadata (standardised, fixed fields that can take a value, computer readable) both provide information about the data at hand. Describing your data is important. Systematically described research data is the key to making your data findable, understandable and reusable. Overall data quality improves with clear and detailed documentation and metadata. 

2. Building metadata sheets with Excel

There are tools to help you add metadata to your research project. Excel is a simple way to create metadata schemes with controlled vocabulary drop-down lists. In practice, you can put metadata fields in columns, and have one row of values or descriptions per measurement.

It is not uncommon for a metadata sheet to hold thirty or more metadata fields to describe your data. As an added benefit, you can easily select specific measurements based on the information noted down in the metadata sheet. If applicable include a field that takes the name of the file that actually holds the measurement data and other files that give detailed information (i.e. log files or scripts of analyses done on your samples, or the exact protocol used to generate the sample). The top row with the metadata fields can be made write protected and values can hold controlled vocabulary in drop-down lists or controlled format such as a date format.

Consider having one generic metadata sheet on your overall study and one for describing the individual measurements.

Example metadata sheet in Excel.
Example metadata sheet in Excel. In rows instances of measurements/interviews/samples/observations/etc. In columns metadata fields.

3. Tips

Some tips when building metadata sheets:

 

  • Make metadata sheets simultaneously with data production 
    Make it a habit to transfer all information on your research data to metadata sheets as soon as you obtain them. Together with the (raw) data and other files mentioned in the metadata sheet, they should provide you the opportunity to understand and find all information needed to reproduce your results. 
     
  • Use a data dictionary or codebook
    To avoid confusion on the interpretation of the value of metadata fields, you should always have the exact definition and scope of your used metadata fields available. Especially if you work with several people in a project, or similar experiments or measurements are done regularly in your research group, it is a good idea to develop metadata schemes for the collected data and have controlled vocabulary to fill in these schemes. What minimum of information should be noted down on the measurements? Which words should be used? They can be documented in a data dictionary (or codebook). E.g. not everybody understands automatically that ‘length’ describes how tall a subject is, or how it is measured. If you standardise your metadata sheet, it can be reused and different experiments can be easily compared.
  • Use standards
    To make your data even more interoperable with other experiments, you can look for an appropriate existing metadata standard to describe your data. These standards are well documented so you can refer to the definitions of the standard, rather than having to describe the metadata fields yourself in a data dictionary (or codebook). Also, using controlled vocabulary for the values will help interoperability, i.e. values to be placed in a metadata field ‘chemical’ will only accept values from the ‘International Chemical Identifier’, or another well-defined standard. At DCC, you will find information on metadata standards and tools for implementing these. Another metadata standards list is maintained at RDA. At the deposit stage, standardised metadata fields to describe the study as a whole are often provided by the data repository interface. Dublin Core or DDI are generic metadata schemes.
     
  • Note down variables you will not use
    To add maximum value to your data, note down variables even if you do not plan to use them. For instance, adding the exact age of subjects might enable someone to study that single aspect in your study even if you yourself group subjects irrespective of age
     
  • Make use of the services of Research Data Management Support
    RDM Support can assist researchers and research groups in designing adequate data documentation and metadata sheets
Drop-down menu for controlled terminology for housing conditions
Drop-down menu for controlled terminology for housing conditions