Managing your research data effectively is crucial to the success of your project. This does not just apply to the immediate context of your thesis or publications. Managing your data is a practice that will benefit you throughout your research career. Research data management is about caring for your data with the purpose of having a documented overview of the research data to be collected, generated and/or used. Besides, it entails the way these data are protected and how they will be made publicly or individually accessible and/or available to others - during and after your research project.
Good research data management promotes:
Access, re-use, impact and recognition
- Facilitating future research by allowing others to build on or add to your research data;
- Increased citations of research data and of publications based on that data.
- Increasing your research efficiency by saving time and resources;
- Preventing duplication of effort by enabling others to use your data.
Quality and security
- Ensuring the integrity and reproducibility of your research;
- Ensuring that research data and records are accurate, complete, authentic and reliable;
- Enhancing data security and minimizing the risk of data loss.
- Meeting legal obligations, restrictions and codes of conduct;
- Meeting the University of Twente Research data management policy requirements;
- Meeting funding body grant requirements.
The University of Twente has a policy on research data management (RDM). This is an overall data policy on how to handle research data. It serves as a framework for data policies in the faculties, institutes, departments and research groups. Data policies give regulations and guidelines regarding data management plans as well as the storage, security, documentation, sharing and archiving of research data.
When setting up your research you should check all RDM policies that are relevant to you. Below you find the data policy of the UT and part of the faculties.
Research funders NWO, ZonMw and the EU have a data management policy which affects grant submission. They all ask you to write a data management plan within a certain amount of months after the start of your project. NWO and the EU also want you to answer specific questions as part of the submission process (data management section) about the way you are going to manage the research data.
NWO implemented a data management policy in all funding instruments with effect from 1 October 2016. The core of this policy is that:
- Calls for proposals will include a data management section in which the researcher should answer a number of short questions.
- No later than 4 months after the project has been awarded, the researcher must submit a data management plan. You can use the template in the UT DMP-tool (also linked to the UT RDM course) which is approved by NWO.
The NWO data policy aims at stimulating researchers to work according to the so-called FAIR data principles, which means that data must be findable, accessible, interoperable and reusable.
More information can be found on the NWO data management page.
ZonMw’s data management policy in the General Terms and Conditions Governing Grants of ZonMw, applicable as from 1 July 2013, can be summarized as follows:
- When writing your grant proposal it is highly recommend to already think about data management during and after your research.
- After a research proposal is granted, the grant recipient must draw up a data management plan. You can use the template in the UT DMP-tool (also linked to the UT RDM course) which is approved by ZonMw.
- ZonMw and the grant recipient will share ownership of the produced data sets.
- The data must be available for the benefit of further scientific and/or academic research.
The ZonMw data policy aims at stimulating researchers to work according to the so-called FAIR data principles, which means that data must be findable, accessible, interoperable and reusable.
More information can be found on the ZonMw data management page.
The EU is running a flexible pilot under Horizon 2020 called the Open Research Data Pilot (ORD). The ORD pilot applies primarily to the data needed to validate the results presented in scientific publications. The pilot follows the principle "as open as possible, as closed as necessary" and focuses on encouraging sound data management as an essential part of best practices in research.
- Your application should address a couple of data management issues (see EU Horizon 2020 data management guidelines, Proposal submission & evaluation)
- Once a project has had its funding approved and has started, you must submit a first version of a data management plan within the first 6 months of the project. You can use the template in the UT DMP-tool (also linked to the UT RDM course).
More information can be found on the EU data management page.
Having a data management plan (DMP) is essential for your research project. A good DMP lets you work more efficiently and improves the integrity and impact of your research.
A data management plan describes:
- what data you will collect and how, as well as which software will be used for collecting, processing and analysis;
- how you will save and share the data during the research project;
- how you will make the data sustainably available and, if possible, publish them afterwards (watch this video about sharing data);
- how the data will be documented;
- how data will be shared and transferred securely;
- what legal issues are relevant, such as copyright, the right to reuse the data and the treatment of sensitive data.
The information you provide in the DMP has to comply with the UT Research data management policy and, if available, the data policy of your faculty and research group, as well as legal, contractual and funder requirements.
To write your DMP, please use the UT DMP-tool. The template in this tool is also accepted by NWO, ZonMw and EU.
As a guidance when writing a DMP you can follow the research data management course.
To enhance your knowledge and insight in research data management there is an online course available. The course is also a guidance for writing your data management plan and consists of the following modules:
- data collection
- data documentation
- data storage
- data security
- data selection and preservation
- data availability for reuse
PhD students should register for the course as participant of the mandatory Twente Graduate School (TGS) bootcamp RDM.
UT staff can enroll directly to the online course.
Storing and sharing of data refers to the dynamic phase of the project. As soon as your research data sets are stable and static you should archive the data for long-term preservation.
All collected research data, including related materials such as protocols, models or questionnaires, must be stored in facilities offered by the UT (LISA), which are ISO 27001- and NEN 7510-certified. See UT Research Data Management policy.
Use the local drive of your laptop or computer only for work copies of your data files as data on these media may be lost in case of malfunctioning or because the device is lost or stolen. The local drive must, if possible, be encrypted to prevent data breach (see the special UT Data Breach webpage).
Use this tool to find the best solution for storing, sharing, transferring or collaborating on research data, during the research.
Especially when research data are to be considered as confidential, for instance in case of personal or sensitive information, data security is needed.
You can find more information about security measures in research on the UT cyber safety webpage.
Data breach in research refers to the loss or theft of, or unauthorized access to personal or confidential data. More specifically it is linked to personal data breach in the framework of GDPR. In case of a personal data breach you must report this within 72 hours (see the special UT Data Breach webpage).
You should pay attention to preventing data breach, regardless the confidentiality of the data, as it may have a negative impact on the research itself, privacy and reputation of involved persons or organizations and the safety of individuals and society.
When you use devices for work copies of data, it is wise to encrypt the device, folder or file with sensitive data to prevent data leaks occurring in the event of loss or theft. When encrypting a single file, there is a high probability of errors or that an application leaves (parts of) the file unencrypted on your hard disk. The best way is to encrypt the entire hard disk or USB stick.
You can find more (practical) information on the Encryption-webpage.
Pseudonymization and anonymization
When working with personal data (data on identified or identifiable natural living persons) you need to comply with the General Data Protection Regulation (GDPR), in Dutch: the Algemene Verordening Gegevensbescherming (AVG). This means that you need to pseudonymize the data when you are processing personal data during the project. As soon as the purpose of the collection of the data has been fulfilled, mostly by the end of the project, in most cases you must anonymize the data.
In short, pseudonymization is a method to substitute identifiable data with a reversible, consistent value. This value is usually kept in a key file, in which the pseudonymized data is linked to the personal data. Be aware that the key file must be stored on a secure and persistent location, such as an encrypted storage device placed in a safe or on the Project and Organization drive of your research group with controlled access.
The purpose of pseudonymization is to protect the privacy of research participants from the onset, during the collection of data. For more information see this report from the National Coordination Point Research Data Management (LCRDM).
Data preservation can be seen as long-term data storage, in the first place preventing physical data loss or destruction, but more than data storage, proper preservation requires also specific technical measures for long-term accessibility.
Preservation of research data contributes to the quality and impact of your scientific work because it enables verification and possible reuse, for instance for further analysis or follow-up, new research or as a contribution to a data resource for the scientific community. Preservation of data is also needed in case of a data publication: a journal article about a specific data set.
Watch this video in which scientists explain the importance of preserving the data of their research durable and accessible.
What to preserve
The table below presents what data you need to preserve depending of the purpose of preservation.
What to preserve
Datasets underlying to research results in publications, plus analysis tools (scripts, etc.)
All raw datasets relevant for further or other research, together with necessary scripts, models, software etc. and documentation
Datasets which are refined for publication, together with additional documentation
In the UT policy it is stated that research data, especially underlying publications, should be Findable, Accessible, Interoperable, and Re-usable (FAIR-principles). Check also relevant policies in your faculty (BMS / ET / ITC).
Preferably during, but no later than 1 month after finishing, the research selected data and related materials should be deposited in a trusted repository.
The preferred trusted data repositories are:
- 4TU.ResearchData for depositing technical and natural sciences data
- DANS for depositing research data from the humanities, health sciences, social and behavioural sciences, oral history and spatial sciences.
Before you archive your research data, think about metadata and documentation, preserving personal data and organizing the data files.
Metadata and documentation*
Adding metadata and documentation when preserving your data is important for the several reasons.
For findability and reuse add:
- Descriptive metadata, such as author, contributor, title, abstract, keywords, measurement type, project ID, geomapping, time period, subject area.
- Descriptive documentation, such as software scripts, instrument settings, methodology, experimental protocol, codebook, laboratory notebook.
For handling data add:
- Administrative metadata, such as data format, date, size, access rights, preservation period, persistent identifier (see below: Publishing data and enhancing your publications), license for use.
- Administrative documentation, such as user agreements, provenance (description of the origin of the data).
For understanding data add:
- Structural metadata, such as related content, related projects, version.
- Structural documentation, such as database scheme, relations between files, table of content.
Data documentation can be best added in a data dictionary or readme.file which should accompany the datasets. Further information and good practices can be found here.
Be aware that, in principle, personal data (containing details that directly identify an individual or can be used to infer their identity, either in isolation or through linking it to another data set) cannot be made openly accessible. Whether the data can be preserved depends on the following:
- Does your institutions’ ethics approval allow the data to be retained for further research?
- Does the consent agreement allow data to be reused for the purpose that you are now envisaging?
- Did the data subjects give their informed consent to its archiving?
- If so, is it feasible to adhere to any conditions of their consent e.g. any commitment to anonymize the data?
- Can the data be securely stored and actively managed to recognized information security standards (e.g. ISO27001)? The UT network storage is ISO 27001 and NEN 7510 certified.
With regard to organizing the data files to be preserved, think about the following:
- Be sure that the data and related materials, such as software, models, scripts, etc. are properly selected and complete.
- Only include data in a data file; do not include figures or analyses.
- Consider aggregating data into fewer, larger files, rather than many small ones. It is more difficult and time consuming to manage many small files and easier to maintain consistency across data sets with fewer, larger files. It is also more convenient for other users to select a subset from a larger data file than it is to combine and process several smaller files. Very large files, however, may exceed the capacity of some software packages. Some examples of ways to aggregate files include by data type, location, time period, measurement platform, investigator, method, or instrument.
- It is sometimes desirable to aggregate or compress individual files to a single file using a compression utility, although the advisability of this practice varies depending on the intended destination repository.
- For long-term preservation the preferred file format should be non-proprietary (open) and persistent, such as PDF, Plain text, TIFF, FLAC, CSV or XML (see also extended list of formats from DANS or from 4TU.ResearchData).
- Rights on data: In most cases the University of Twente is right holder of data from research carried out by UT staff. The RDM policy UT (section 3, paragraph ‘Data archiving’) should be taken into account. In other cases, check IP right holder(s), copyright, patent and/or database rights.
For more information and support, contact Novel-T.
- Costs: Preparing for the preservation of research data may need financial resources to cover costs for selecting, converting, preserving and making the data available. You can find more information here.
* Based on University of Utrecht Research Data Management Support, Storing and preserving data (https://www.uu.nl/en/research/research-data-management/guides/storing-and-preserving-data#Preserve)
To publish your data, simply deposit them in a trusted repository. One of the crucial services of a trusted repository, such as 4TU.ResearchData or DANS, is the issuing of a persistent identifier, which guarantees sustainable access to your data. Watch this video about Persistent identifiers and data citation explained (Research Data Netherlands).
Once you have published your data, you can enhance your publication(s). This process is two-fold: You need to let your dataset refer to your article(s), and vice versa.
Both 4TU.ResearchData and DANS offer the possibility to include a reference to your published article(s). This reference will be part of the metadata describing your data. If permitted by the copyright holder, DANS will archive your publication(s) along with the accompanying dataset.
For upcoming articles, please make sure that your data reference is included in the reference list of your article. We also recommend mentioning this reference in your cover letter, so reviewers can verify your research. For published articles, please contact the publisher and request a link to your data to be displayed online, along with the description of your article.
In order to offer an overview of final data sets available at the UT, you are expected to register your data set(s) in Pure research information. Make sure to do this before the end of your research project.
Costs for data management made during a research project can be inserted into a proposal’s budget. These may be costs related to temporary storage, to the anonymization or the transcription of data, or to the curation of data before sustainable archiving.
Use this guide for estimating research data management costs.
This toolbox is developed for support staff looking for tools to help them work with RDM related matters. It contains all sorts of resources that can be helpful when supporting researchers from their group or faculty. Feel free to use the documents, presentations, visuals and other tools listed below for your own use. They are all licensed under a CC-BY-NC-SA license, unless stated otherwise.
Data policies UT
Interview 4TU.researchdata 10th anniversary with Qian Zhang and Arnd Hartmanns
On September 29th one of our data stewards: Qian Zhang (BMS & ITC), interviewed Arnd Hartmanns. Arnd is an assistant professor in the Formal Methods and Tools group at EEMCS faculty. They talk about the experiences with, and advantages of, data sharing. This interview was part of the 4TU.researchdata celebratory day around their 10th anniversary and the launch of their renewed repository. Check out the interview in this video (the interview part starts at 51:40):
Visuals RDM support organization
Next to different tools to be used for support staff, there are also some internal and external courses available at our university.
RDM Canvas course for scientific and support staff
Are you looking for a good start into Research Data Management (RDM)? Enroll yourself in the RDM course that is available for UT staff via Canvas! The course can be used as the lead while writing your own data management plan. It serves every aspect of good RDM and will guide you through all the steps you have to take in the process of RDM.
Are you a PhD student who is obligated to follow this course as part of the TGS bootcamp? Please register for this course here.
Other UT staff can enroll directly via this link.
MOOC on RDM from TUdelft
As a start of their career as data steward at the UT, Simone, Judith and Qian followed the MOOC on: Open Science: sharing your research with the world. They found it a very helpful and interesting way to get acquainted with the topics of sharing data and RDM. This course is open and available for everyone via this link: