Documenting your data in Yoda

It is important to document your data to make sure that your data will be understandable to yourself, your collaborators and potentially other scientists who want to work with your data.

You can document your data in multiple ways: 

  • A README.txt file in the root of your data package that provides context to your data package (what is this project about, who were involved, when was it conducted, what does the data package contain, etc.). You can find a template for a README file in the FAIR data cheatsheet
  • Data-level documentation, such as a codebook, a lab notebook or a data overview file. 
  • Other project-related documentation, such as a study protocol, scripts, experiment files, analysis workflow, et cetera. 

You can read more about general documentation guidelines in the RDM Support guide Metadata and Documentation.

Documentation: use the metadata form

In Yoda, besides adding the described documentation, it is possible to add structured metadata to a research group and subfolders of the research group. The Yoda metadata conforms to the DataCite metadata standard and broadly describes the data package, how it can be reused and cited, and, if published, make it findable in data catalogues. When you archive and publish your data package, it is mandatory to add metadata.

How to add metadata

1. Log into the Yoda web portal and navigate to the folder that you want to add metadata to.
2. Select ‘Metadata’.

 Screenshot of the Research environment in the Yoda web portal. A folder is shown called research-training/test that contains a test-upload.txt file. Highlighted is the Metadata button.
Step 2: Select ‘Metadata’

3. Fill in the form as completely as possible. Mandatory fields are marked with an asterisk (*). Some fields can have multiple values, such as authors (creators), contributors, keywords, and related resources. You can click the Plus (+) icon next to those fields to add a new value.
4. When you are done filling in the form, click ‘Save’ and then ‘Close’ at the bottom of the form. A new file called ‘yoda-metadata.json’ will appear in the folder.
5. You can always continue filling in the form later if needed. Simply click the ‘Metadata’ button again.

Once the metadata form is filled in, it is possible to submit the data package for archiving, and afterwards for publication. If you need help filling in the metadata form, please feel free to ask your data manager for help.

Reusing metadata

If the ‘parent folder’ of your folder already contains a Yoda metadata file, the metadata form will include a ‘Clone from parent folder’ button. This will copy the contents of the metadata form from the parent folder to the just opened metadata form in the subfolder. This way, you can reuse, and potentially adjust that metadata.

The Yoda metadata form explained

In this table, some fields in the Yoda metadata form are explained. The values in the ‘Mandatory’ column state whether the field is mandatory (Y: mandatory; N: not mandatory) when archiving a data package.

Field 

Mandatory 

Description 

Format 

Title 

Title of your data package. When you publish your data package, the title will be harvested by other catalogues. 

 Maximum length: 255 characters. 

Description 

A concise description (abstract) of your data package, containing for example subject, sample size, methodology, etc. 

Maximum length: 2700 characters. 

Discipline 

Choose the (sub)discipline of the study from the list. The list contains a combination of research disciplines and subdisciplines. 

The list uses the OECD FOS 2007 standard. 

Version 

Version number or date of the dataset. Yoda does not automatically assign version numbers to data packages. If you create multiple versions, you can register the version number yourself. 

Free format. For example: v1.0 (semantic version), v2024.09.20 (date version). 

Language of the data 

Choose the main language of the data in the dataset. 

The list uses the ISO 639/1 standard. 

Collection process: start and end date 

Start and end date of data collection. 

YYYY-MM-DD 

Location(s) covered 

Indication of the geographical entities (countries, regions, cities) covered within this data package. 

English naming convention preferred. We recommend using the preferred spelling from the Getty Thesaurus of Geographic Names. Add only one location per line: use the plus sign to add more values. Maximum length: 255 characters. 

Period covered: start and end date 

Indication of the start and end dates of the period covered by your dataset. This is not necessarily the same as the collection dates (for example: historical data may be collected in 2024 but cover 1900). 

YYYY-MM-DD 

Keywords 

Keywords/tags that describe your data package and may allow others to more easily find your data package. 

Free format. Add only one keyword per line: use the plus sign to add more values. Maximum length: 255 characters 

Related resource 

Any resources (articles, data packages, software, etc.) related to your data package, their identifier/link and how they are related to your data package. 

It is possible to add multiple related resources: use the plus sign to add more values. 

Related resource – Relation type 

Choose how the related resource is related to the current data package. 

The list uses the DataCite relationType vocabulary

Related resource - Title 

Title of the related resource. 

There is no automatic check whether the title matches with the persistent identifier. Maximum length: 255 characters. 

Related resource – Persistent identifier 

The Identifier and Identifier type for the related resource. For example: type: DOI, identifier: http://doi.org/10.24416/UU01-729A2Y  

 

Retention period (years) 

The minimum number of years that the data package should be preserved in the Vault. 

Number (integer) 

Default: 10 years 

Retention information 

Text field for remarks about the retention period. Use this field if you deviate from the default retention period. 

Free format 

Embargo end date 

It is possible to set an embargo on the data package. The metadata will be published already, but the data will only become available after this period. Specify here the date on which the embargo should end. 

YYYY-MM-DD 

Data type 

Choose what type of package it concerns. 

Datapackage (default), Software, Method, Other document 

Data classification 

Choose how sensitive the data is in terms of confidentiality, integrity and availability. 

 

Name of collection 

If the data package is part of a larger (conceptual) collection, you can enter the collection name here. 

The research group should ensure that all other data packages in the collection are archived with the same collection name. Maximum length: 255 characters. 

Funding reference 

The funding source(s) of your data package. 

This field can have multiple values: use the plus sign to add more values. 

Funding reference - Funder 

Name of the organization funding the research. For example: Dutch Research Council. 

Use names as specified in the Research Organization Registry (ROR). Maximum length: 255 characters. 

Funding reference – Award number 

The grant number issued by the funding organization. 

Free format. 

Creator 

The author(s)/creator(s) of the data package 

This field can have multiple values: use the plus sign to add more creators. 

Creator – Name 

The personal/first name (Given Name) and surname/last name (Family Name) of the creator. 

Maximum length: 255 characters. 

Creator – Affiliation 

Select the organizational or institutional affiliation of the creator. The Affiliation identifier (ROR) will automatically appear. 

This field can have multiple values: use the plus sign to add more affiliations for the creator. 

Creator – Person identifier 

The Identifier and Identifier type for the creator, such as AuthorID, ORCID, or ResearcherID

Each creator can have multiple persistent identifiers: use the plus sign to add more person identifiers. Maximum length: 255 characters. 

Contributor 

The person(s) who contributed to this data package. 

This field can have multiple values: use the plus sign to add more contributors. 

Contributor - Name 

The personal/first name (Given Name) and surname/last name (Family Name) of the contributor. 

Maximum length: 255 characters. 

Contributor – Contributor type 

Choose how the contributor primarily contributed to the data package. 

The list uses the DataCite contributorType vocabulary

Contributor - Affiliation 

Select the organizational or institutional affiliation of the contributor. The Affiliation identifier (ROR) will automatically appear. 

This field can have multiple values: use the plus sign to add more affiliations for the creator. 

Contributor – Person identifier 

The Identifier and Identifier type for the contributor, such as AuthorID, ORCID, or ResearcherID

Each contributor can have multiple persistent identifiers: use the plus sign to add more person identifiers. Maximum length: 255 characters. 

Data package access 

Choose the access level under which the data package should be made available once published. 

Open – freely retrievable (publicly available), Restricted – available upon request (only available after specified conditions), Closed (not shared). 

License 

The terms that specify what others are allowed to do with the contents of the data package. 

If the Data package access is set to ‘Open – freely retrievable’, you can choose from a number of often-used licenses (recommended: Creative Commons Attribution 4.0). If the data package is restricted or closed, you select Custom and will have to add a License.txt file to your data package. Contact your data manager for help with this.