When working with personal data, various considerations with regard to data protection, privacy regulations and ethical and scientifically responsible behaviour should play a role in the data management phase. This page provides an overview of the conditions researchers should be aware of for various tasks, like levels of sensitivity and general regulations for gathering, processing and storing data. Links to the full source material and extra reading can be found at the bottom of the page.
Dutch law describes personal data as “any information concerning an identified or unidentified natural person, which can lead to the identification of a natural person without unreasonable effort”. It should be noted that a combination of different variables in a dataset might also lead to subjects being identifiable. For example, neither age, place of birth nor newspaper subscription on their own are likely to make subjects identifiable. But when these pieces of information are combined, it becomes much more likely that subjects can be identified. As a general rule, any data about a natural person should therefore be regarded as personal data.
Directly identifiable data is (a combination of) information that can lead directly to a subject, like address, phone number, e-mail, citizen’s service number, ip-address, bank account numbers, etc.
Indirectly identifiable data is information that does not allow for a direct identification of subjects, but nevertheless permits researchers or third parties to identify subjects without unreasonable effort. For example, a person of a certain age, who has an uncommon profession and lives in a small village can still be easily identified even when all directly identifiable data is deleted.
Sensitive information, that is a person’s political or religious affiliation, sexual orientation, medical and criminal records and union membership, should be treated with additional care.
Before you share your data with anyone else, make sure that the dataset is completely anonymized. This means your dataset should not contain any privacy-sensitive information. Variables such as Names, address, contact information, citizen’s service, social security and tax numbers (BSN/Sofi), and medical record numbers, must be removed. Dates directly related to the individual (birth, death, admission, discharge) must be re-encoded to years, and postal codes must be re-encoded to four digits. Finally, it is recommended to re-encode specific occupations to classifications of occupations, using the Standard Classification of Occupations (SBC, in Dutch) by Statistics Netherlands or the International Standard Classification of Occupations(ISCO) by the International Labour Organization.
Datasets that are completely anonymized are not covered by the Personal Data Protection Act and can therefore be shared and made openly available without any restrictions. However, keep in mind the earlier example given under indirectly identifiable data and additionally delete any indirectly identifiable data for specific persons if you think this is necessary!
These levels of sensitivity can be directly translated into a risk assessment – how large are the consequences for respondents of their data being leaked, stolen, or otherwise becoming public?
SURFnet and IGS datalab use the guidelines set out in the table below for risk assessment. To which exact class a particular data set belongs is a somewhat subjective decision, which depends on the particular combination of individual pieces of information, and the context in which they occur.
Type of information
e.g. professional e-mail address.
No specific measures besides those mandated by the WBP.
Limited amount of personal data regarding the connection between respondent(s) and organization(s);
e.g. student enrollment.
Standard information protection measures are adequate;
e.g. password secured access.
2 (increased risk)
Special personal data;
e.g. economic status, dyslexia statement.
Increased information protection measures;
3 (high risk)
Special personal data;
e.g. psychological evaluation, medical records.
Highest possible level of security measures;
e.g. encryption + off-line storage.
Respondents have the right to ask the responsible party (in practice such a request would likely be made to you – the researcher) whether or not personal information regarding them is used. Such a request needs to be sufficiently precise, i.e. the respondent needs to make clear in what project he thinks his data to be used. The data in question also needs to be directly identifiable.
If personal data regarding the respondent is indeed on file, the respondent has the right to view said data, and request the data be changed, supplemented or deleted if the data is factually inaccurate, irrelevant to or inadequate for the research, or otherwise violates any legal requirements.
Contact information, that is, directly identifiable information, should be separated from other information. This separation should be as thorough as possible. Ideally, both datasets should be stored in different physical locations, and have different access protocols. Both datasets can be linked with an otherwise meaningless administrative identification number. Contact information should be destroyed when it is no longer necessary.
Any time contact information is gathered, it should be kept separate from the main research data. Further, such contact information should be deleted as soon as it is no longer reasonably required. The potential consequences of high-risk data becoming public can be (partially) negated if such data is either anonymized, or at least separated from contact information.