Documenting your data

Metadata is 'data about data'. It is information about your data package necessary to:

make the dataset findable in data catalogues;
describe the contents of the dataset for a broad audience;
inform the audience whether the data can be reused and if so, under what conditions;
prescribe how the data should be cited and whom to acknowledge;
inform digital archivists and IT staff how long the data should be retained.

You have to add metadata before you can archive or publish your data.

The Yoda metadata form explained

The Yoda metadata form consists of approx. 33 fields. The table below lists all metadata elements with a description of their function. The values in the Man. column state whether the field is mandatory when archiving a data package.

No	Element	Man.	Description	Explanation
1	Title	Y	The title of your dataset.	When you publish your data package, the title will be harvested by other catalogues. Maximum length: 255 characters.
2	Description	Y	A description of your dataset.	Describe your dataset, e.g. the subject, the sample size, methodology, etc. It is best to keep this description concise. A more elaborate description can be added in a readme.txt and/or a codebook file. Maximum length: 2,700 characters.
3	Discipline		The (sub)discipline of the study.	The list contains a combination of research disciplines and subdisciplines. The standard used is the OECD FOS 2007. This field can have multiple values — use the plus sign to add more values.
4	Version	Y	The version number of the dataset.	Yoda does not automatically assign version numbers to data packages. If you create multiple versions, you can register the version number yourself, according to your own versioning scheme.
5	Language of the data		The (main) language of the data in the dataset.	This element is thought of as a possible aid to assess the usability of a dataset for a specific person. The standard used is ISO 639/1.
6a	Collection Process - Start Date		Indicate when you’ve started collecting the data for this dataset.	Format: YYYY-MM-DD
6b	Collection Process - End Date		Indicate when you’ve finished collecting the data for this dataset	Format: YYYY-MM-DD
7	Location(s) covered		Indication of the geographical entities, like countries, regions and cities, covered by this dataset	English naming convention preferred. It is recommended to use the preferred spelling from the Getty Thesaurus of Geographic Names whenever possible. One location per line. This field can have multiple values — use the plus sign to add more values. Maximum length: 255 characters.
8a	Period Covered - Start period		An indication of the start date of the period covered by your dataset	Format: YYYY-MM-DD
8b	Period Covered - End Period		An indication of the end date of the period covered by your dataset	Format: YYYY-MM-DD
9	Tag		Free text field for adding (searchable) keywords to your dataset	You can choose the keywords freely. It is best to add only one keyword per line. This field can have multiple values — use the plus sign to add more values. Maximum length: 255 characters.
10a	Related Data package		The way in which the present data package is related to another data package.	In this section you can enter a ‘related’ dataset and the nature of that relation. For instance, you can indicate that another dataset contains the raw data by selecting IsSourceOf in this field and entering the information of the other data package in the fields below. This field can have multiple values — use the plus sign to add more values.
10b	Related Data package - Title	If 10a	Title of the data package related to the present data package.	There is no automatic check whether title and persistent identifier match. Maximum length: 255 characters.
10d	Related Data package – Type	If 10e	The type of the persistent identifier of the related data package.	Example: “DOI”.
10e	Related Data package – Identifier	If 10d	The persistent identifier of the related data package.
11	Retention Period	Y	The minimal number of years the data will be kept in the archive. The default value is 10 years.	In this field you can only enter integers. The policies for handling expiry of the retention period of a data package are still to be defined.
12	Retention Information		To be used for remarks about the retention period.	Please provide a reason if you deviate from the default value of ten years. If you want to ensure that data is retained longer, then data management might request extra care for choosing sustainable file formats.
13	Embargo enddate		If the dataset has an embargo, on what date does the embargo end?	This functionality is not yet fully implemented. Please contact the data manager if you intend to publish a data package with an embargo.
14	Data Classification	Y	Please indicate the classification of the data.	The data-classification according to the BIV guidelines (the linked document is in Dutch).
15	Name of Collection		If this data package is part of a larger (conceptual) collection of data packages, you can enter the collection name here.	The research group should ensure that all other data packages in the collection are archived with the same collection name. Maximum length: 255 characters.
16	Funder		The name(s) of the organization(s) funding the research.	Example: “NWO”. This field can have multiple values — use the plus sign to add more values. Maximum length: 255 characters.
17	Award Number		The grant number issued by the funding organization
18	Creator of Data package	Y	The name of the person(s) who created (this version of) the dataset.	Preferred format: Family Name, First Name . This field can have multiple values — use the plus sign to add more values. Maximum length: 255 characters.
19	Affiliation	Y	The organizational or institutional affiliation of the creator	Example: “Utrecht University”. The affiliation of the creator of a data package could be of importance when it is unclear who owns the data. In general the organization to which the creator was affiliated is regarded as the owner. Each creator can have multiple affiliations — use the plus sign to add more values. Maximum length: 255 characters.
20a	Creator of Data package –Persistent Identifier: Type		Please indicate the type of persistent person identifier.	E.g. AuthorID, ORCID or ResearcherID. Multiple values are possible. If available, enter at least an ORCID.
20b	Creator of Data package – Persistent Identifier: Identifier		The Persistent Identifier.	If you are not sure whether someone has a persistent identifier, you can check with the big three providers: AuthorID, ORCID,ResearcherID. Each creator can have multiple persistent identifier — use the plus sign to add more values. Maximum length: 255 characters.
21	Contributor(s) to Data Package		The name of the person(s) who contributed to this dataset.	Preferred format: Family Name, First Name. Multiple values possible — use the plus sign to add more values. Maximum length: 255 characters.
22	Contributor Type		Enter what type of contribution the registered person has had to this data package.	Examples: Project lead or Project member This field can have multiple values — use the plus sign to add more values.
23	Affiliation		The organizational or institutional affiliation of the contributor.	E.g. Utrecht University. Each contributor can have multiple affiliations — use the plus sign to add multiple values. Maximum length: 255 characters.
24a	Contributor of Data package – Persistent Identifier: Typ		Please indicate the type of persistent person identifier.	Each contributor can have multiple persistent identifiers — use the plus sign to add more values. Maximum length: 255 characters.
24b	Contributor of Data package – Persistent Identifier: Identifier		The unique person identifier	Each contributor can have multiple identifiers — use the plus sign to add more values. Maximum length: 255 characters.
25	License	Y	The license under which you offer the data package for use by third parties. The preferred value for open data is CC By 3.0.	Every package needs to be archived with a license — even when you’re not planning to publish the data or have it reused in any form. We offer a number of possible licenses in a drop-down list. If you do not know which license to choose, contact the data manager. At the moment of publishing a data package the relevant license text will be copied into the data package. If you opt for a custom license, you will need to store the custom license text in a file titled License.txt in the root folder. Please contact the data manager first if you want to work with custom licenses.
26	Data Package Access	Y	Once archived, will your dataset be accessible to third parties?	Open Access means that the dataset is accessible to everyone. Restricted Access means that the dataset can only be obtained on request. Closed Access means that the dataset cannot be shared, in principle.

Documenting your data

Instructions for adding metadata

Adding extra metadata

Reusing metadata

The Yoda metadata form explained

Contact

Follow UU