The long-term archiving of research data can be done safe and secure by using the UT facility Areda. This facility is integrated with the already available datasets registration in the UT Research Information System (Pure).
- What is Areda
Areda is the University of Twente archive for the long-term storage of static data collected, generated or used in UT research projects. But archiving is more than just storing data. Metadata must be added, so datasets can be findable, whereas proper documentation is needed for interpretation and verification, as well as interoperability and reuse of the data. Therefore Areda is linked to the UT research information system (Pure), for adding metadata, while documentation can be included in a README file.
All files are durably stored on ISO 27001 and NEN 7510 certified servers at the University of Twente. The back-up facility is hosted by Surf, which data centers are located in Utrecht and Amsterdam, The Netherlands. Default, preservation and availability is for a period of 10 years. In the near future, other preservation periods are possible.
Areda offers research groups their own ‘bucket’ where (zipped) files can be uploaded and shared among the group members in accordance with the group’s data policy and guidelines.
- How to archive datasets in Areda
Archiving datasets in Areda means that you start with uploading a ZIP file to the intake bucket of your research group. This ZIP file should contain selected data files, encrypted if needed, including documentation written in a README file (see also Making data FAIR). Please, add metadata about the dataset, such as title, creator, etc. and a copy of the README file in the UT research information system (Pure). There you can also link the dataset to one or more of your publications.
Here is a visual of the archiving intake process:
To help you with the intake process, there is a guide available.
Metadata and documentation will be reviewed by the data steward of the faculty. After this, the ZIP file will be transferred to the bucket of the research group.
Apart from archiving in Areda, it is recommended to publish research data and share it with others outside the UT. Sharing or publishing datasets means that you upload them to a trusted data repository, preferably 4TU.ResearchData or DANS.
- Access and sharing
Datasets are archived in the research group bucket. All research group members have access to the bucket of their own group. Access to files can only be restricted by means of encryption.
You can share a dataset with people outside the group. Therefore, in Areda you can generate a unique, temporary link to the dataset.
Be aware that access and internal sharing should be in accordance with data policy of the research group or higher organizational entity, and agreements with third parties involved.
- Personal data
The General Data Protection Regulation (GDPR) requires that personal (any information which are related to an identified or identifiable natural person) data are not kept longer than necessary to achieve the purposes for which they are processed. If longer preservation is needed, anonymization is preferred.
In any case, be sure that you registered the processing of personal data in compliance with the GDPR. When you indicated in your GDPR registration the need of preserving personal data, you should act in accordance with this information about preservation and protection of the data.
At the moment it is strongly advised not to add personal or other confidential data in Areda. A sound encryption method will be available as of September 2021.
- Data files organization
With regard to organizing the data files to be preserved, think about the following:
- Be sure that the data and related materials, such as software, models, scripts, etc. are properly selected and complete.
- Only include data in a data file; put figures and/or analyses derived from/based on these data in separate files.
- Especially for convenient reuse, consider aggregating data into fewer, larger files, rather than many small ones. It is more difficult and time consuming to manage many small files and easier to maintain consistency across data sets with fewer, larger files. It is also more convenient for other users to select a subset from a larger data file than it is to combine and process several smaller files. On the other hand, very large files may exceed the capacity of some software packages. Some examples of ways to aggregate files include by data type, location, time period, measurement platform, investigator, method, or instrument.
- Use a consistent file name structure, such as ProjectName_YYYYMMDD_ContentDescription.ext. Do not use personal data in path or file names.
Organize files and folders in such a way that the dataset can be uploaded in ZIP files not larger than approx. 100 GB. You can put ZIP files of the same dataset in one subdirectory, in Areda called ‘path’.