Preserving, or archiving, data means that a copy, including description and documentation, is durably stored at the UT, preferably in the data archive Areda.
Apart from merely storing data for the long term, also metadata can be added to make the data findable, as well as proper documentation for the sake of interpretation, verification, interoperability and reuse of the data. For adding metadata and documentation (README file), Areda is linked to the UT Research Information System (Pure).
Important: Areda is currently only accessible with support of a data steward as part of an evaluation of the user experience. Based on input received from users, we are currently working to improve the system in a few key aspects. If you have any questions about this notice or Areda, please contact your faculty’s data steward.
- What is Areda
Areda is the University of Twente archive for the long-term storage of static data, which is collected, generated or used in UT research projects.
Areda offers research groups:
- Certified storage for long-term preservation (at least 10 years) of static datasets.
- A ‘bucket’ where data archive files (zip or tar) can be uploaded and shared among the group members.
- Technically suitable storage for all kind of static data, including special categories of personal data.
- The possibility to add metadata and documentation making use of the UT research information system (Pure).
Please, keep in mind that:
- Areda should not be used for datasets that still might change (dynamic data), or need to be used regularly.
- For the moment there is a maximum upload of 1 TB.
- Access to datasets is at research group level: all group members have access to the datasets in the group’s bucket (folder). Groups and members are based on the HR system.
- In case of uploading personal or other confidential data, access has to be managed by using encryption. Areda offers an encryption instruction. Encryption keys need to be securely managed by at least one permanent staff member of the research group besides the principal investigator, preferably the chair of the research group.
- Areda should not be used for digital informed consent forms or pseudonymization keys. These must be stored encrypted and separate from the data in Areda, for instance on the p-drive.
- Although in the group’s bucket a folder structure (by creating a path) is possible, it is advised to use it as a single archive (database) of zip-files containing a structured set of data files of a research project.
- For making overviews and searching of datasets, use the UT Research Information Portal.
- Preparation for archiving datasets
Archiving of datasets demands some preparation:
- selecting and organizing the data files,
- writing data documentation,
- creating an archive file (zip or tar), and
- encryption in case of personal or other confidential data.
For more information, please consult the guide Archiving datasets in Areda.
- Archiving datasets in Areda
Archiving datasets in Areda means upload the archive file you created during the preparation to the intake bucket of your research group. Next, add metadata about the dataset, such as title, creator, etc. and a copy of the README file in the UT research information system (Pure). There you can also link the dataset to one or more of your publications.
Here is a visual of the archiving intake process:
To help you with the intake process, please consult the guide Archiving datasets in Areda.Metadata will be reviewed by the data steward of the faculty. After this, the archive file will be transferred to the bucket of the research group.
Apart from archiving in Areda, it is recommended to publish research data and share it with others outside the UT. Sharing or publishing datasets means that you upload them to a trusted data repository, preferably 4TU.ResearchData or DANS.
- Access and sharing
Datasets are archived in the research group bucket. All research group members have access to the bucket of their own group. Access to files can only be restricted by means of encryption.
You can share a dataset with people outside the group. Therefore, in Areda you can generate a unique, temporary link to the dataset.
Be aware that access and internal sharing should be in accordance with data policy of the research group or higher organizational entity, and agreements with third parties involved.
It is advisable to add terms of use in the documentation of the dataset in the README file (guidance / template).
- Personal data
The General Data Protection Regulation (GDPR) requires that personal (any information which are related to an identified or identifiable natural person) data are not kept longer than necessary to achieve the purposes for which they are processed. If longer preservation is needed, anonymization is preferred. More information about handling personal data in research can be found here.
In any case, be sure that you registered the processing of personal data in compliance with the GDPR. When you indicated in your GDPR registration the need of preserving personal data, you should act in accordance with this information about preservation and protection of the data.
When archiving datasets with personal data, please be aware that file encryption is needed. Instruction about file encryption is offered during the Areda archiving process.
- Questions and Answers
- Why can I best archive datasets in Areda?
Areda is a UT facility especially for archiving static datasets. In Areda research groups can easily manage the datasets which need to be archived, not only for verification but also for internal reuse. Areda offers a cheap and reliable object storage for long-term, persistent and immutable archiving.
- Can I start directly archiving datasets in Areda?
You can start directly archiving datasets. Guidance is available on the Areda portal.
- Does Areda issue a persistent identifier with the archived dataset?
No, because Areda is not aiming at data publishing. When you also publish the dataset in a data repository (4TU.ResearchData, DANS, etc.) a persistent identifier, such as a DOI, will be issued. Read more about persistent identifiers in Making data FAIR.
- As a UT bachelor or master student, can I archive datasets in Areda?
UT bachelor and master students cannot archive datasets in Areda by themselves. If you need to archive datasets, ask your supervisor to add them to his or her group’s bucket in Areda. As you are in principle the dataset rights holder, check policies and guidelines of the research group, especially regarding use rights and licenses.
- How can I encrypt data files in Areda?
Areda offers you information about encryption and instructions about tools to be used for encrypting data files with personal or other confidential content. Recommended tools can be downloaded and installed by yourself. Please, be aware that encryption demands some extra preparation time.
- Which metadata and documentation should I add when archiving datasets in Areda?
General metadata can be added in the UT Research Information System. Choose ‘Dataset’ and fill in the information asked, such as Title, Description, Date of data production, Contributors, Publisher, DOI, Access information, Temporal coverage, and Geo location.
Other metadata, and documentation, can be included in the README file which should accompany the dataset and be added to the description in UT Research Information System (guidance / template).
Metadata and documentation elements
metadata
documentation
descriptive
author, contributor, title, abstract, keywords, measurement type, project ID, geomapping, time period, subject area
software scripts, instrument settings, methodology, experimental protocol, codebook, laboratory notebook
administrative
data format, date, size, access rights, preservation period, persistent identifier, license for use.
user agreements, provenance (description of the origin of the data), terms of use
structural
related content, related projects, version
database scheme, relations between files, table of content
- Is there a maximum volume of datasets I can archive in Areda?
As yet a maximum volume of archived datasets has not been determined.
- How much time the process of archiving will take?
Apart from preparing the dataset itself and the documentation by writing a README file and adding metadata, the processing time will largely be determined by the size of the zipped data file to be uploaded and the capacity of the internet connection. In case you will upload a 50 GB zipped data file, it may take at least one hour.
Warning: Make sure your PC or laptop does not go into standby, sleep or hibernate mode during the upload.
- Can I replace an archived dataset for a new version?
No. Once a dataset is archived, it will remain unchanged. It is a so-called immutable object preserved for at least 10 years. You can archive a newer version separately and indicate the relation with the older version in filename and documentation.
- What are the costs charged for archiving datasets in Areda?
No costs will be charged. It will be paid from central budget.
- What is the storage location of the datasets when archived in Areda?
Data are stored on ISO 27001 and NEN 7510 certified servers at the University of Twente. The back-up facility is hosted by Surf, data centers are located in Utrecht and Amsterdam, The Netherlands.
- Who can access the datasets archived in Areda?
Default, all members of a research group have access to the datasets in the bucket of the group. Access to data files can be restricted by means of encryption.
- What is the preservation period of the datasets when archived in Areda?
Default, datasets will be preserved in Areda for 10 years. In the near future it will be possible to indicate other preservation periods.
- What happens with the archived datasets in Areda after preservation period has expired?
Shortly before expiring of the preservation period, the research group receives a message to decide whether the dataset must be deleted. Prolongation of the preservation period may not be free of charge.