Seven ways to check personal data before, during and after your research.
Handling personal data
The Algemene Verordening Gegevensbescherming, or the implementation of the General Data Protection Regulation has come into force on May 25 2018. The Algemene Verordening Gegevensbescherming (AVG) requires you as a researcher to provide clarity and transparency to data subjects about how and why your personal data is processed.
If you collect research data that enables you to identify a person, then this is classified as personal data. Personal data can include a variety of information, such as names, address, phone number, occupation and IP addresses. Certain personal data is considered particularly sensitive and thus requires specific protection when it reveals information that may create important risks for the fundamental rights and freedoms of the particular individual. Examples of sensitive personal data include data on a person's race, ethnic origin, political opinion, physical or mental health, criminal record, sexual orientation, religious or other beliefs and economical status.
Direct or indirect identification
An identifiable natural person is someone who can be identified, either directly or indirectly, by reference to an identifier such as a name, an identification number, location data, an online identifier, an occupation or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
During 2018 the Data Protection Officer at Utrecht University will closely follow how the AVG is translated for scientific purposes. Following the checklist below will ensure that you are mostly prepared and adhere to the principles which the General Data Protection Regulation (GDPR) prescribes.
Article 35 of the GDPR introduces the concept of a Data Protection Impact Assessment (DPIA). In Dutch, it is called Gegevens Beschermings Effect Beoordeling (GBEB). This is mandatory if data processing is likely to pose a high privacy risk for the data subjects.
The process and goal of a DPIA (or GBEB)
During a DPIA you fill in a form which helps you assess privacy issues and resulting measures to fix possible privacy problems in an early stage. For example: when storing personal data on laptop computers, the use of appropriate technical and organisational security measures (effective full disk encryption, robust key management, appropriate access control, secured backups, etc.) in addition to existing policies (notice, consent, right of access, right to object, etc.) can be required. The DPIA should be seen as a tool to help you with decision-making concerning data processing. It should be continuously reviewed and regularly reassessed
- When should I perform a DPIA?
The DPIA should be carried out prior to data processing. This is consistent with data protection by design. Check if you are obliged to do so with the DPIA checklist.
- When isn't a DPIA required?
Note that when the nature, scope, context and purposes of the intended data processing are very similar to the processing for which DPIA has already been carried out, the results of the previous DPIA may be used.
- How should I perform a DPIA?
You can choose a tool yourself as long as it contains at least the following:
- A systematic description of the intended data processing and the purposes thereof. Do you rely on a legitimate interest as the basis for processing? Include this in the description;
- An assessment of the necessity and the proportionality of the processing. That means: is the processing of personal data necessary in this way to achieve your goal? And isn't a possible violation of the privacy of those involved (the people whose data you process) disproportionate to this purpose?
- An assessment of the privacy risks for those involved;
- The intended measures to (1) address the risks (such as safeguards and safety measures) and to (2) demonstrate that you comply with the GDPR.
- Which tool to perform a DPIA does Utrecht University recommend?
You can use the following tools:
- Privacy Impact Assessement by NOREA (Dutch, very elaborate, English translation available here. This isn't a formal translation, but a 'quick and dirty' service to help you out on this subject);
- Privacy Impact Assessment for Utrecht University (Dutch, requires logging in with your Solis-id, based on an instrument by SURF);
- Additional tools for ethical reflection
- You can also just start with the Privacy Checklist which Utrecht University has issued (Dutch).
- DEDA for Research - Data Ethics Decision Aid a tool for a broader assessment of the ethical aspects of your research. This is an online survey in which you are asked a series of open questions to raise awareness of certain issues and help document the decision making process. This tool was developed by Utrecht Data School.
- When should I consult the supervisory authority? (Autoriteit persoonsgegevens)
Only in cases where the identified risks cannot be sufficiently addressed after the DPIA (i.e. the residual risks remains high). In such cases, contact firstname.lastname@example.org first.
To make sure you will translate the assessment of risks into appropriate measures, write them down in your data management plan (DMP). Be sure to assign responsibilities (record who is authorised to do what) which makes sure you adhere to the AVG/GDPR-principle of accountability.
Making a DMP before you start collecting personal data will also help you practice 'privacy by design', which is an important principle in the AVG/GDPR. The AVG/GDPR states that you should minimise data size and only collect data which are relevant, limited to what is necessary and only for specified, explicit and legitimate purposes.
See the guide on 'Data management planning' for more information on developing your DMP.
Research data (personal or not) must be carefully secured against loss, theft and tampering. As part of Utrecht University's Information Security Policy, you are asked to classify your data. Classifying data is a practical means by which to apply neither too little nor too much protection. Based on a set of questions, you determine the value of your data as well as the security risks these data are exposed to. This allows you to reach a conclusion about the impact a data breach of your data could have. More information on data classification can be found on the intranet.
You can go through the classification process yourself. If you need help, contact the data classification contact person from your faculty. This is generally the Local Information Security Manager (LISM), but you can also get help from the university's Corporate Information Security Officer (CISO) via email@example.com.
A data classification procedure involves three security aspects of the data:
Concerns whether authorised users have timely access to the data at the right times.
Refers to whether the data are correct and complete and whether only authorised users can make changes to the data
Relates to whether the data are only accessible for authorised users.
You can then consult a matrix to read about the corresponding measures you should take in order to properly protect the data. This could entail data encryption, two-factor access control, the need for an additional back-up, auditing or detection of unauthorised changes.
According to the AVG/GDPR-principles of lawfulness, fairness and transparency, personal data cannot be distributed without informed consent. Informed consent of participants in a project is needed to arrange sharing, preservation and long-term use of their personal data. Your participants should be unambiguously informed of what will be done with the data and give consent. Subsequently, data processing should be done accordingly.
See 'Informed Consent for data sharing' for information on gaining informed consent for sharing of research data beyond the purposes for which your data is collected.
Identifiable personal data is data that without a disproportional large effort leads to the identity of a person. The best way to protect your participant's privacy may be to not collect certain identifiable information at all. The second best way to protect data subjects is anonymisation or pseudonimisation, which allows data to be shared without disclosing your participant’s personal information.
When dealing with identifiable data, consider the following:
- Anonymise data
Take note that a person's identity cannot only be disclosed by direct identifiers (name, address, telephone number) but also by indirect identifiers (age, place of birth, occupation, family composition, salary) that, linked with other information, can lead to a person's identification. Anonymisation, to the point that the person is no longer identifiable, is one way to avoid having to take strict security measures when sharing your data.
- Replace the unique identifier of a person with a pseudonym
This measure can provide the means to still be able to link records between sets with information from the same person while protecting their privacy at the same time.
- Separate identifiable information from other information
Storing identifiable information apart from other information and storing these and their key separate is another possible security measure you can take.
- Encrypt data
If it is not feasible to de-identify the data, encrypting data is also a way to prevent personal data to be disclosed (See step 6).
Only if the access can unambiguously be restricted to authorised persons (see step 7), can data be stored without such measures. Yoda, for instance, is a safe storage environment where this is possible.
For an elaborate visualisation of what is considered identifiable data, check out the information sheet at the Future of Privacy Forum which offers a useful visual guide to practical data de-identification.
According to the AVG/GDPR you should ensure data integrity and confidentiality and ensure that data are accurate and where necessary kept up to date. Every reasonable step should be taken to ensure that personal data that are inaccurate are erased or rectified without delay. Also, data which aren't used should be removed, unless these data are needed to be able to verify or reproduce research.
You can protect the information in your data files by:
Controlling access to restricted materials with encryption
By coding your data, your files will become unreadable to anyone who does not have the correct encryption key. You may code an individual file, but also (part of) a hard disk or USB stick. At Utrecht University BoxCryptor is developed for encryption.
Also, you shouldn't send personal or confidential data via email or through File Transfer Protocol (FTP), but rather by transmitting it as encrypted data (e.g. via SURFfilesender),
Like arranging access conditions in a consortium agreement and, if necessary, through non-disclosure agreements with participants and data handlers via data transfer or processor agreements (See the guide on 'Legal instruments and agreements');
Computer system security
The computer you use to consult, process and store your data, must be secured:
- Use a firewall to protect your data from viruses;
- Install anti-virus software;
- Install updates for your operating system and software;
- Only use secured wireless networks;
- Use passwords and do not share them with anyone. Do not use passwords on your university computer only, but also on your laptop or home computer. If necessary, secure individual files with a password;
- Do not provide others with your login credentials.
- Only allow access to the data to registered people and withdraw access when they leave.
Physical data security
With a number of simple measures you can ensure the physical security of your research data:
- Lock your computer when leaving it for just a moment (Windows key + L);
- Lock your door if you are not in your room;
- Keep an eye on your laptop;
- Do not leave unsecured copies of your data lying around;
- Transport your USB stick or external hard disk in such a way that you cannot lose it;
- Keep non-digital material which should not be seen by others, in a locked cupboard or drawer;
Destroying data in a consistent and reliable manner when needed
Personal data should be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed. Note that deleting files from hard disks only removes the reference to it, not the file itself. Overwrite the files to scramble their contents or use secure erasing software. For USB and CD/DVD, physical destruction works best to erase data.
When your project is finished and you decide to publish and share your data in a data repository, be aware that personal data can only be put there with appropriate consent and after considering ethical issues. If both do not pose problems, you can still protect personal data as an extra precaution, by limiting access to the data. A creative commons license is less appropriate, as you do not want data under restriction to be spread further without explicit permission. A user agreement can settle such obligations.
Many data repositories offer the following access categories:
- Open access
Data that can be accessed by any user whether they are registered or not. Data in this category shouldn't contain personal information (unless consent is given, and data is not very sensitive).
- Restricted access
Access is limited and can only be granted upon request. This access category is for the most sensitive data that may contain disclosive information. A creative commons license is less appropriate, as you do not want data under restriction to be spread further without explicit permission. A user agreement can settle such obligations.
If your data with information on persons (possibly) leaked, you have to report this as soon as possible to the university security officer at firstname.lastname@example.org as it may be considered as a data leak or breach. Not reporting a (possible) data leak can lead to a very high fine.
YOUth is a large-scale, long-term cohort study which follows children from before birth until the age of 16. The participants are asked to sign an Informed Consent form. YOUth scientists are not allowed to share YOUth data with other scientists or journals either publically or privately themselves. All data requests are evaluated on eligibility criteria by a data management committee. It technically possible, the data is given out with a unique pseudonymisation code, to prevent data coupling with data from other requests. A specific Data Transfer Agreement (DTA) needs to be signed by the requesting party, which states the limitations towards purpose, storage, and access.