Documenting your data in Yoda
It is important to document your data to make sure that your data will be understandable to yourself, your collaborators and potentially other scientists who want to work with your data.
You can document your data in multiple ways:
- A README.txt file in the root of your data package that provides context to your data package (what is this project about, who were involved, when was it conducted, what does the data package contain, etc.). You can find a template for a README file in the FAIR data cheatsheet.
- Data-level documentation, such as a codebook, a lab notebook or a data overview file.
- Other project-related documentation, such as a study protocol, scripts, experiment files, analysis workflow, et cetera.
You can read more about general documentation guidelines in the RDM Support guide Metadata and Documentation.
Documentation: use the metadata form
In Yoda, besides adding the described documentation, it is possible to add structured metadata to a research group and subfolders of the research group. The Yoda metadata conforms to the DataCite metadata standard and broadly describes the data package, how it can be reused and cited, and, if published, make it findable in data catalogues. When you archive and publish your data package, it is mandatory to add metadata.
How to add metadata
1. Log into the Yoda web portal and navigate to the folder that you want to add metadata to.
2. Select ‘Metadata’.
3. Fill in the form as completely as possible. Mandatory fields are marked with an asterisk (*). Some fields can have multiple values, such as authors (creators), contributors, keywords, and related resources. You can click the Plus (+) icon next to those fields to add a new value.
4. When you are done filling in the form, click ‘Save’ and then ‘Close’ at the bottom of the form. A new file called ‘yoda-metadata.json’ will appear in the folder.
5. You can always continue filling in the form later if needed. Simply click the ‘Metadata’ button again.
Once the metadata form is filled in, it is possible to submit the data package for archiving, and afterwards for publication. If you need help filling in the metadata form, please feel free to ask your data manager for help.
Reusing metadata
If the ‘parent folder’ of your folder already contains a Yoda metadata file, the metadata form will include a ‘Clone from parent folder’ button. This will copy the contents of the metadata form from the parent folder to the just opened metadata form in the subfolder. This way, you can reuse, and potentially adjust that metadata.
The Yoda metadata form explained
In this table, some fields in the Yoda metadata form are explained. The values in the ‘Mandatory’ column state whether the field is mandatory (Y: mandatory; N: not mandatory) when archiving a data package.
Field | Mandatory | Description | Format |
Title | Y | Title of your data package. When you publish your data package, the title will be harvested by other catalogues. | Maximum length: 255 characters. |
Description | Y | A concise description (abstract) of your data package, containing for example subject, sample size, methodology, etc. | Maximum length: 2700 characters. |
Discipline | Y | Choose the (sub)discipline of the study from the list. The list contains a combination of research disciplines and subdisciplines. | The list uses the OECD FOS 2007 standard. |
Version | N | Version number or date of the dataset. Yoda does not automatically assign version numbers to data packages. If you create multiple versions, you can register the version number yourself. | Free format. For example: v1.0 (semantic version), v2024.09.20 (date version). |
Language of the data | Y | Choose the main language of the data in the dataset. | The list uses the ISO 639/1 standard. |
Collection process: start and end date | N | Start and end date of data collection. | YYYY-MM-DD |
Location(s) covered | N | Indication of the geographical entities (countries, regions, cities) covered within this data package. | English naming convention preferred. We recommend using the preferred spelling from the Getty Thesaurus of Geographic Names. Add only one location per line: use the plus sign to add more values. Maximum length: 255 characters. |
Period covered: start and end date | N | Indication of the start and end dates of the period covered by your dataset. This is not necessarily the same as the collection dates (for example: historical data may be collected in 2024 but cover 1900). | YYYY-MM-DD |
Keywords | Y | Keywords/tags that describe your data package and may allow others to more easily find your data package. | Free format. Add only one keyword per line: use the plus sign to add more values. Maximum length: 255 characters |
Related resource | N | Any resources (articles, data packages, software, etc.) related to your data package, their identifier/link and how they are related to your data package. | It is possible to add multiple related resources: use the plus sign to add more values. |
Related resource – Relation type | Y | Choose how the related resource is related to the current data package. | The list uses the DataCite relationType vocabulary. |
Related resource - Title | Y | Title of the related resource. | There is no automatic check whether the title matches with the persistent identifier. Maximum length: 255 characters. |
Related resource – Persistent identifier | Y | The Identifier and Identifier type for the related resource. For example: type: DOI, identifier: http://doi.org/10.24416/UU01-729A2Y |
|
Retention period (years) | Y | The minimum number of years that the data package should be preserved in the Vault. | Number (integer) Default: 10 years |
Retention information | N | Text field for remarks about the retention period. Use this field if you deviate from the default retention period. | Free format |
Embargo end date | N | It is possible to set an embargo on the data package. The metadata will be published already, but the data will only become available after this period. Specify here the date on which the embargo should end. | YYYY-MM-DD |
Data type | Y | Choose what type of package it concerns. | Datapackage (default), Software, Method, Other document |
Data classification | Y | Choose how sensitive the data is in terms of confidentiality, integrity and availability. |
|
Name of collection | N | If the data package is part of a larger (conceptual) collection, you can enter the collection name here. | The research group should ensure that all other data packages in the collection are archived with the same collection name. Maximum length: 255 characters. |
Funding reference | N | The funding source(s) of your data package. | This field can have multiple values: use the plus sign to add more values. |
Funding reference - Funder | N | Name of the organization funding the research. For example: Dutch Research Council. | Use names as specified in the Research Organization Registry (ROR). Maximum length: 255 characters. |
Funding reference – Award number | N | The grant number issued by the funding organization. | Free format. |
Creator | Y | The author(s)/creator(s) of the data package | This field can have multiple values: use the plus sign to add more creators. |
Creator – Name | Y | The personal/first name (Given Name) and surname/last name (Family Name) of the creator. | Maximum length: 255 characters. |
Creator – Affiliation | Y | Select the organizational or institutional affiliation of the creator. The Affiliation identifier (ROR) will automatically appear. | This field can have multiple values: use the plus sign to add more affiliations for the creator. |
Creator – Person identifier | N | The Identifier and Identifier type for the creator, such as AuthorID, ORCID, or ResearcherID. | Each creator can have multiple persistent identifiers: use the plus sign to add more person identifiers. Maximum length: 255 characters. |
Contributor | N | The person(s) who contributed to this data package. | This field can have multiple values: use the plus sign to add more contributors. |
Contributor - Name | Y | The personal/first name (Given Name) and surname/last name (Family Name) of the contributor. | Maximum length: 255 characters. |
Contributor – Contributor type | Y | Choose how the contributor primarily contributed to the data package. | The list uses the DataCite contributorType vocabulary. |
Contributor - Affiliation | Y | Select the organizational or institutional affiliation of the contributor. The Affiliation identifier (ROR) will automatically appear. | This field can have multiple values: use the plus sign to add more affiliations for the creator. |
Contributor – Person identifier | N | The Identifier and Identifier type for the contributor, such as AuthorID, ORCID, or ResearcherID. | Each contributor can have multiple persistent identifiers: use the plus sign to add more person identifiers. Maximum length: 255 characters. |
Data package access | Y | Choose the access level under which the data package should be made available once published. | Open – freely retrievable (publicly available), Restricted – available upon request (only available after specified conditions), Closed (not shared). |
License | Y | The terms that specify what others are allowed to do with the contents of the data package. | If the Data package access is set to ‘Open – freely retrievable’, you can choose from a number of often-used licenses (recommended: Creative Commons Attribution 4.0). If the data package is restricted or closed, you select Custom and will have to add a License.txt file to your data package. Contact your data manager for help with this. |