Data management planning
To help you consider possible costs involved in making your research findable, accessible, interoperable and reusable have a look at the 'Guide: Cost for data management'. Some funders consider the costs for data management eligible for funding.
We recommend the use of DMPonline because of its functionalities. You can share your DMP with collaborators, set permissions and keep your DMP up-to-date during your research project. However, in some cases a funder may require that their own template is used. You can share your DMP with collaborators, set permissions and keep your DMP up-to-date during your research project. NWO and ZonMw approve the templates from Utrecht University and UMC Utrecht and recommend using these. However, in some cases a funder may require that their own template is used.
If (sensitive) research data are to be destroyed, for instance if they are no longer necessary for the goal of the conducted research, this has to be done in a manner that is verifiable and is irrevocable (see UU policy Research Data). You are responsible for the correct destruction of documents and files. Paper documents can be dispatched by specially assigned destruction containers, or a shredder (often available in printer rooms). Carriers with sensitive data can also be destroyed. The FSC has outsourced this activity to a certified contractor. The specialized company can destroy a range of materials and media: paper, archive files, microfilm, CD-ROMs, DVDs, clothing with logos, tapes, hard disks, etc. You can arrange this via the FSC Report Form. For specific files on your computer, deleting them is not enough as only the index is removed. These files can be removed permanently by overwriting them several times with a different file. Also software exists that erases information permanently. Mind that on backed up servers, long term backups of the data may still exist.
Your data is anonymous if it cannot, without a disproportionate effort, reasonably be linked to persons. If a key is still available to re-identify individuals, it is not anonymous. Anonymization can be done by removing direct identifiers, and removing/modifying indirect identifiers that together could be used to identify a person. For example, birth dates can only include the year -- no month or day, and only the first four digits of a ZIP code can be shared if the population in that area is sufficiently large. It is unfortunately hard to give a strict recipe for anonymization, as this is very different for audiovisual, geo data, interviews, patient data, etc. CBS provides some techniques for statistical disclosure control. The European commission has adopted an opinion on anonymization. As it needs to be reasonably unidentifiable, it is good if you at least provide a rationale for what you deleted, aggregated, or randomized, and what you thought would be safe to leave in. Note that some data is in itself personal data, such as DNA. Does de-identification render the data useless? Negotiate in the informed consent if keeping it identifiable, together with additional measures to safeguard privacy, is acceptable.
Preserving and sharing data
The Dutch Code of Conduct for Academic Practice says that research data must be kept for (at least) 10 years. The University Policy Framework for Research Data adds that this 10-year period starts after you have published your paper based on the data you are preserving.
WGBO (article 4) states that medical records of patients stored in a file should be kept for 15 years or longer if necessary. For drug research, the (patient) data must be stored for 20 years.
The AVG states that personal data may not be kept longer than is necessary for the purposes for which they were collected or for which they are used. Personal data may however, with permission, and appropriate safeguards, be preserved for historical, statistical or scientific purposes.
Data and documentation files can be stored together in a data package. For verification, all documentation and data (raw or possibly analysed) that enable research replication must be provided. For reuse, data should be stored as raw as possible (if usable in that form) along with documentation to help comprehend and reuse it. In both of these cases you should include:
- A variable list or code book explaining the variables in your data.
- If applicable, the computer code used to perform analyses, and/or an explanation of performed analyses ('methods');
- A file which describes the files in the data package and their relation.
Your data should be available for verification purposes after research is finished. In principle, data should also be made available for reuse by others. Unless there is a legal, ethical or commercial reason why this is not possible. Or if the costs of making data (open access) available for reuse are not in proportion to the value of the data.
Making your data open, for instance via a public repository, certainly has advantages. It will generate more traffic to your data. Moreover, you will not have to arrange the answering to requests for access during the preservation period (see FAQ 'For how long should I keep my data'). You will comply to funder requirements.
When your dataset is completed, you should make it available as soon as possible, for instance at the same time as the paper is published. You can set access restrictions to privacy-sensitive data. See our guide on 'Handling personal data' to choose the appropriate access category.
Researchers may be allowed periods of limited privileged use of the data to allow them to publish the results, or to file a patent, for example. Check your individual funder’s policy for details.
Preferably, you deposit your data in a discpline specific data repository or data-type related repository. That is where your data will be found best by your peers.
If such a repository doesn't exist for your field or if you prefer a general repository, have a look at the relevant section our guide on publishing and sharing data or go straight to the decision aid we developed. Based on your requirements, you will be advised on a top 3 of appropriate general repositories for your research data.
To link data to your publication, you create references. At the repository of your choice you get a permanent identifier, which you place in the data availablility statement of your publication. In the repository of your choice for your research data, there usually is a field where you can register the title and persistent link to your publication. Register your dataset and publication in PURE, the content registration system of Utrecht University.
In some cases, it makes sense to limit access to research data.
- Because you are bound by contract not to disclose the data to just anyone (IP rights, patents pending, collaborations).
- If you have privacy sensitive data (a public repository may not be save enough, in any case).
- If you have anonymized your data, however it still is on individual record level and the theoretical risk of reidentification is not zero.
- If you have published one paper but have specific research planned on the same data, and you want to enable verification of results, but not yet reuse.
In most other cases, however, it will pay off to make your data available without restrictions. There will be more traffic to your data, increasing the chance of requests for cooperation or citing of your data at reuse, increasing your impact. And be aware that if your make your data available to other people, you yourself get access to other people’s smart insights.
If your data can be processed by external tools in a safe enough way depends on how these tools will handle your personal or sensitive research data. For sensitive personal data (for a definition, see our index page) more or stricter security measures are needed. Below is a checklist to broadly assess if ‘normal’ or even sensitive personal data can be processed by a tool. If you want to know if for your specific research data a specific tool is appropriate, contact the information security specialists at RDM support firstname.lastname@example.org.
- Will the personal data leave the European Economic Area borders? Acceptable: no.
- Will the personal data be transmitted by the company to third parties? Acceptable: no.
- Who will have access? Preferably: no one (automated). Acceptable: qualified people who are bound to strict rules such as set up in a non-disclosure agreement.
- Are security measures in place to ensure only qualified people will have access? E.g. Encryption, pseudonymization, access limitation, firewall. Are these audited? Is there a certificate such as ISAE 3402 (typ 2) (general), ISO 27018 (privacy), ISO 27001 (security) ? Acceptable: yes.
- Is it clear what the retention policy is for your data (e.g. backups)? If the only copy of your data is (temporarily) at the company, it is important your data is not lost. Acceptable: Yes.
- What can/will the company use the personal data for? Acceptable: for no other purpose than providing you the service.
- How long will they keep the personal data? Acceptable: for no longer than is necessary for providing you the service.
- Does the company state the data will become their property? Not acceptable.
As an example, AmberScript was selected as a safe tool to transcribe Dutch audio material containing personal data, based on information available on issues above. Check yourself if these still apply at the time you yourself want to use this tool!
If these things are not specified, you can try to arrange them in a Data Processor Agreement (see our Guide ‘Legal agreements and documents’).
For some tools, an Utrecht University local solution is available or can be requested, which can make the tool suitable after all. See our page on Tools for data analysis and modelling I. Tools for interactive computing
Officially Utrecht University, as your employer, is considered the rights holder to the research data you create. You, as a researcher, have the primary responsibility for taking care of the data.
However, questions on data exploitation and reuse rights may be even more important than those of ownership. Who can use the data? Who can publish it? Who can provide it to third parties? In its policy Utrecht University states that research data should be shared with others both inside and outside of the university, if possible. There may be other parties with a claim to the data.
We strongly recommend that you deal with the issues around data exploitation at an early stage in your research project. In your data management plan, write down all agreements between yourself, your supervisor and other interested parties and negotiate terms for processing, dissemination and reuse. See the guide 'Legal instruments and agreements' for an overview of possibilities.
When the time comes to share your (non-personal) data, the most practical solution is to put a creative commons license on the data you want to share. Then you make clear what usage conditions apply without people having to ask permission.
Check out our guide on 'Handling personal data' and follow the steps.
The most important changes that you will have to deal with are:
- The definition of sensitive data is broader and now also includes, for instance, genetic, mental, cultural and economic data;
- Obtaining consent for processing data must be clear and must seek an affirmative response;
- Parental consent is required for processing personal data of children under the age of 16;
- Users may request a copy of personal data in a portable format;
- Adhere to the seven principles of the GDPR
- You will have to demonstrate that you have taken adequate technical and organisational measures to protect personal data and the systems which hold these data;
- A Data Protection Impact Assessment (DPIA) will be required for projects where privacy risks are high.
Asking consent by digital tick-box is allowed for studies which are non-invasive and do not concern critically privacy sensitive data. Furthermore the research should in that case be digital in its totality, such as an online survey. If participants are physically present, an autograph should be asked instead to confirm consent. In any case, the consent should be given actively. Of course, a tick box will be a weaker case than an (electronic) autograph in case of verifying if anyone gave consent. If you want to know if digital consent is appropriate in your specific research, contact RDM Support. Also see the FAQ of the FETC (on intranet).