Know how to select and preserve your research data for the long term, and where to publish and register them.
- Preserving research data
Data preservation can be seen as long-term data storage, in the first place preventing physical data loss or destruction, but more than data storage, proper preservation requires also specific technical measures for long-term accessibility.
Preservation of research data contributes to the quality and impact of your scientific work because it enables verification and possible reuse, for instance for further analysis or follow-up, new research or as a contribution to a data resource for the scientific community. Preservation of data is also needed in case of a data publication: a journal article about a specific data set.
Watch this video in which scientists explain the importance of preserving the data of their research durable and accessible.
What to preserve
The table below presents what data you need to preserve depending of the purpose of preservation.
What to preserve
Datasets underlying to research results in publications, plus analysis tools (scripts, etc.)
All raw datasets relevant for further or other research, together with necessary scripts, models, software etc. and documentation
Datasets which are refined for publication, together with additional documentation
In the UT policy it is stated that research data, especially underlying publications, should be Findable, Accessible, Interoperable, and Re-usable (FAIR-principles). Check also relevant policies in your faculty (BMS / ET / ITC).
Preferably during, but no later than 1 month after finishing, the research selected data and related materials can be deposited in Areda, the UT data archive. Areda is still a beta-version and will be tested during the first quarter of 2021. If you are interested to test the system, please contact the data steward in your faculty.
Apart from archiving in Areda you can deposit your data sets in a trusted repository.
The preferred trusted data repositories are:
- 4TU.ResearchData for depositing technical and natural sciences data
- DANS for depositing research data from the humanities, health sciences, social and behavioural sciences, oral history and spatial sciences.
- Preparing data for preservation
Before you archive your research data, think about metadata and documentation, preserving personal data and organizing the data files.
Metadata and documentation*
Adding metadata and documentation when preserving your data is important for the several reasons.
For findability and reuse add:
- Descriptive metadata, such as author, contributor, title, abstract, keywords, measurement type, project ID, geomapping, time period, subject area.
- Descriptive documentation, such as software scripts, instrument settings, methodology, experimental protocol, codebook, laboratory notebook.
For handling data add:
- Administrative metadata, such as data format, date, size, access rights, preservation period, persistent identifier (see below: Publishing data and enhancing your publications), license for use.
- Administrative documentation, such as user agreements, provenance (description of the origin of the data).
For understanding data add:
- Structural metadata, such as related content, related projects, version.
- Structural documentation, such as database scheme, relations between files, table of content.
Data documentation can be best added in a data dictionary or readme.file which should accompany the datasets. Further information and good practices can be found here.
Be aware that, in principle, personal data (containing details that directly identify an individual or can be used to infer their identity, either in isolation or through linking it to another data set) cannot be made openly accessible. Whether the data can be preserved depends on the following:
- Does your institutions’ ethics approval allow the data to be retained for further research?
- Does the consent agreement allow data to be reused for the purpose that you are now envisaging?
- Did the data subjects give their informed consent to its archiving?
- If so, is it feasible to adhere to any conditions of their consent e.g. any commitment to anonymize the data?
- Can the data be securely stored and actively managed to recognized information security standards (e.g. ISO27001)? The UT network storage is ISO 27001 and NEN 7510 certified.
With regard to organizing the data files to be preserved, think about the following:
- Be sure that the data and related materials, such as software, models, scripts, etc. are properly selected and complete.
- Only include data in a data file; do not include figures or analyses.
- Consider aggregating data into fewer, larger files, rather than many small ones. It is more difficult and time consuming to manage many small files and easier to maintain consistency across data sets with fewer, larger files. It is also more convenient for other users to select a subset from a larger data file than it is to combine and process several smaller files. Very large files, however, may exceed the capacity of some software packages. Some examples of ways to aggregate files include by data type, location, time period, measurement platform, investigator, method, or instrument.
- It is sometimes desirable to aggregate or compress individual files to a single file using a compression utility, although the advisability of this practice varies depending on the intended destination repository.
- For long-term preservation the preferred file format should be non-proprietary (open) and persistent, such as PDF, Plain text, TIFF, FLAC, CSV or XML (see also extended list of formats from DANS or from 4TU.ResearchData).
- Rights on data: In most cases the University of Twente is right holder of data from research carried out by UT staff. The RDM policy UT (section 3, paragraph ‘Data archiving’) should be taken into account. In other cases, check IP right holder(s), copyright, patent and/or database rights.
For more information and support, contact Novel-T.
- Costs: Preparing for the preservation of research data may need financial resources to cover costs for selecting, converting, preserving and making the data available. You can find more information here.
- FAIR Data Fund: 4TU.ResearchData offers researchers from TUDelft, TUEindhoven and University of Twente a budget (up to €3.500) to cover the costs of making their data Findable, Accessible, Interoperable and Reusable (FAIR principles). You can find more information here.
* Based on University of Utrecht Research Data Management Support, Storing and preserving data (https://www.uu.nl/en/research/research-data-management/guides/storing-and-preserving-data#Preserve)
- Publishing data and enhancing your publications
To publish your data, simply deposit them in a trusted repository. One of the crucial services of a trusted repository, such as 4TU.ResearchData or DANS, is the issuing of a persistent identifier, which guarantees sustainable access to your data. Watch this video about Persistent identifiers and data citation explained (Research Data Netherlands).
Once you have published your data, you can enhance your publication(s). This process is two-fold: You need to let your dataset refer to your article(s), and vice versa.
Both 4TU.ResearchData and DANS offer the possibility to include a reference to your published article(s). This reference will be part of the metadata describing your data. If permitted by the copyright holder, DANS will archive your publication(s) along with the accompanying dataset.
For upcoming articles, please make sure that your data reference is included in the reference list of your article. We also recommend mentioning this reference in your cover letter, so reviewers can verify your research. For published articles, please contact the publisher and request a link to your data to be displayed online, along with the description of your article.
- Registering your data in Pure
In order to offer an overview of final data sets available at the UT, you are expected to register your data set(s) in Pure research information. Make sure to do this before the end of your research project.