IGS University of Twente

Data Management

 

Data Management Plan (info)

Name of student/researcher(s)

 

Name of supervisors/(co-) promotors

 

Name of group/project

 

Description of your research

Briefly summarise the type of your research to help others understand the purposes for which the data are being collected or created.

Funding body(ies)

 

Grant number

A grant number provides unique identification for the grant.

Partner organisations

 

Project duration

Start: MM-DD-YYYY End: MM-DD-YYYY

Date written

MM-DD-YYYY

Date last update

 

Version

A new version of the DMP should be created whenever important changes to the project occur due to inclusion of new data sets, changes in consortium policies or external factors.

Name of researcher(s), the institution/group and their roles/responsibilities for data management

Naming anyone and their specific roles and responsibilities for data management (data collection, data analysis, data storage, etc. but data ownership as well) is especially important for collaborative projects that involve many researchers, institutions, and/or groups.

1. Data Collection (info)

Describe the data you will collect.

Checklist:

o How will data be collected?

o Will you also use pre-existing data? From where?

o What type of data will be collected? (measurements, observations, questionnaires, models, etc.)

o In what file formats?

o Which tools or software are needed to create, process and/or visualize the data?

o Do the data have a specific character in terms of reproducibility, confidentiality (e.g. privacy), etc? What does this mean for the management of the data?

o What is the estimated total size of the data, and what growth rate? What is the estimated number of files and the maximum file size?

o How do you handle version control to maintain all changes that are made to the data?

If desired you can use the following table for describing your data:

Type of data

Format

Software

Data size/growth

Specific character

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2. Data Storage and Back-up (info)

Ensure that during your research all research data are stored securely and backed up or copied regularly.

Checklist:

o How will the raw data, processed data, models/codes, informed consents, etc. be stored and backed up during the research?

o Which storage medium will you use for your storage and backup strategy? Network storage; personal storage media (CDs, DVDs, USBs, portable hard drives); cloud storage

o Are backups so that you can restore in the event of data loss? What is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) of the storage medium?

o Are the data backed up at different locations?

If desired you can use the following table:

Data

Storage medium and location

Backup location, frequency, RPO and RTO

Raw data

 

 

Processed data

 

 

Models/code

 

 

Informed consents

 

 

Other?

 

 

3. Data Documentation (info)

Document your data to help future users to understand and reuse it.

Checklist:

o What standards will be used for documentation and metadata? If there is not a standard already available for your data, outline how and what metadata will be created.

o How will your data be documented during your research and for long-term storage?

o What directory and file naming convention will be used to enable the titling of your folders, documents and records in a consistent and logical way?

o What project and/or data identifiers will be assigned? (e.g. DOI or Digital Object Identifier)

4. Data Access (info)

 

Describe how authorized access to the data is managed both during and after the research.

Checklist:

o How will you manage copyright and Intellectual Property Rights issues? E.g. Who owns the data? How will the data be licensed for reuse?

o Are there any limitations on the access of your data?

o What are the access criteria for the data (open/restricted access, embargo period, etc.)?

o Who controls data access (e.g. PI Principal Investigator, student, lab, university, funder)?

 

5. Data Sharing and Reuse (info)

 

Describe how your data can be shared and reused.

Checklist:

o What expectations there are for reuse of your data?

o Can you identify the audience for reuse? Who will use it now? Who will use it later?

o What agreements or requirements there are about data sharing (e.g. funder data sharing policy, commercial partner agreement)?

o If you allow others to reuse your data, how will the data be shared? In case the dataset cannot be shared, the reasons for this should be mentioned (e.g. ethical, rules of personal data, intellectual property, commercial, privacy-related, security-related).

o When will you publish your data and where? Will they be linked to one or more scientific publications?

 

6. Data Preservation and Archiving (info)

Describe which data will be preserved after the research and where and how these will be archived.

Checklist:

o Which criteria will you use to decide which data has to be archived for preservation and long-term availability. Which data has to be destroyed?

o How long should it be preserved (e.g., 3-5 years, 10-20 years, permanently)?

o What file formats will be used for long-term preservation and availability?

o Which data repository is appropriate for archiving your data (3TU.Datacentrum, DANS, subject-based data repository)?

o What are the estimated total costs for archiving the data in the selected repository?

Guidance

Data Management Plan

This template has been designed specifically for researchers of the three technical universities: TU Delft, University of Twente and Eindhoven University of Technology. The template may be appropriate for Principal Investigators collaborating in a research project or for research students working on a PhD or Masters project. The Data Management Plan consists of 6 sections. Each section is accompanied with a checklist, providing the most important questions to be answered in your Data Management Plan. Furthermore, more detailed information on each section will show up when you ‘open’ the info button.

1. Data Collection

An accurate description of the data to be collected is important as a basis for good data management.

There are four main types of research data:

·

Observational data: captured in real time, typically cannot be reproduced exactly

·

Experimental data: from labs and equipment, can often be reproduced but may be expensive to do so

·

Simulation data: from models, can typically be reproduced if the input data is known

·

Derived or compiled data: after data mining or statistical analysis has been done, can be reproduced if analysis is documented

Data types can include text, numbers, images, 3D models, software, audio files, video files, reports, surveys, etc. 

Provide information on the existence (or not) of similar data and the possibilities for integration and reuse.

File formats

In planning a research project, it is important that you consider which file formats you will use to store your data. In some cases, this will be dictated by the software you are using or the conventions of your discipline, but in other cases you may have to make a choice between several options. These are likely to be some of the key factors in your decision-making:

· what software and formats you or colleagues have used in past projects,

· any discipline-specific norms (and any peer support that comes with them),

· what software is compatible with hardware you already have,

· whether you have funding for new software for the job,

· how you plan to analyse, sort, or store your data.

But you should also consider:

· what formats will be easiest to share with colleagues for future projects,

· what formats are at risk of obsolescence, because of new versions or their dependence on particular software and/or hardware,

· what formats it will be possible to open and read in the future,

· what formats will be easiest to annotate with metadata so that you and others can interpret them days, months, or years in the future.

In some cases you may be best off using one format for data collection and analysis and converting your data to a standard format for archiving once your project is complete. After conversions, data should be checked for errors or changes that may be caused by the export process.


Version control

Because digital research data can so easily be copied, over-written or changed, researchers need to take steps to protect its authenticity. Research time is wasted and valuable data put at risk if researchers work with outdated versions of files.

Version control can prevent this. Control is particularly important if data is being used by multiple members of a research team, or if research files are shared across different locations.

A regime to synchronize different copies or versions of files will improve research efficiency and help guarantee the authenticity of the data. Good practice generally involves the keeping of a single master file, to which all changes are recorded. Version control mechanisms should be established and documented before any data is collected or generated.

 

2. Data Storage and Back-up

It is the responsibility of the researcher to ensure that their research data and related information like scripts, software, survey templates, documentation, informed consents, etc. is stored securely regularly backed-up for the life of the project. It is good practice to store only what you need to keep and keep at least three copies of crucial data. It is recommended that data is stored on the university’s networked fileservers and copies kept on remote storage and/or portable storage.

Generally there are four options for data storage:

Networked drives: University fileserver As these are secure and backed-up regularly, they are ideal for master copies of your research data.

Local drives: PCs and Laptops – Data can be lost because local drives can fail, or the computer may be lost or stolen. These are convenient for short-term storage and data processing but should not be relied upon for storing master copies, unless backed-up regularly.

Remote or Cloud storage – commonly used services, such as Dropbox and Google Drive, will not be appropriate for sensitive data, and their service level agreements should be studied before using them to store your research data.

External portable storage devices – External hard drives, USB drives, DVDs and CDs. These are very convenient, being cheap and portable, but not recommended for long-term storage as their longevity is uncertain and they can be easily damaged.

For more information about the networked and local drives: https://www.utwente.nl/icts/en/diensten/catalogus/dataopslag_mw/

You may choose to only back up certain data, or to back up files you use every day more regularly than others. The basic rule of thumb is:  The more important the data and the more often they change, the more regularly they need to be backed up.

If your files take up a large amount of space and backing up all of them (or backing them up sufficiently frequently) would be difficult or expensive, you may want to focus on backing up specific key information, programs, algorithms, or documentations that you would need in order to re-create the data in case of data loss.

3. Data Documentation

Describe the types of documentation that will accompany the data to help secondary users to understand and reuse it. This should at least include basic details that will help people to find the data, including who created or contributed to the data, its title, date of creation and under what conditions it can be accessed.

Documentation may also include details on the methodology used, analytical and procedural information, definitions of variables, vocabularies, units of measurement, any assumptions made, and the format and file type of the data. Consider how you will capture this information and where it will be recorded. Wherever possible you should identify and use existing community standards. See: http://www.dcc.ac.uk/resources/metadata-standards

File naming

Organising your files and folders effectively and efficiently can save you time and make collaboration easier by ensuring you are working on the correct version of the data. A good file name makes it easy to identify, locate and retrieve your data. There is no one recommended way to name your files and folders, but you should name your files consistently. If you work as part of a research group, you should decide on a file and folding naming system with your colleagues. See for practical information: http://guides.lib.purdue.edu/c.php?g=353013&p=2378293

Identifiers

An identifier is a reference number or name for a data object and forms a key part of your documentation and metadata. To be useful over the long-term, identifiers need to be unique (globally unique if possible) and persistent (the identifier should not change over time).

The emerging identifier standard for publicly available datasets is the Digital Object Identifiers (DOIs). Although DOIs have been traditionally used for journal articles, they can now be assigned to datasets. 3TU.Datacentrum (more information see section 6: Data preservation and archiving) will automatically assign a DOI to a dataset that you make available.

4. Data Access

During the research project you will want to keep your research data safe and secure. You will want to determine who has access to your data and what they are authorised to do with it. Data security is needed to prevent unauthorised access or disclosure and changes to or destruction of data. The principle investigators are responsible for ensuring data security. The level of security required depends upon the nature of the data – personal or sensitive data need higher levels of security.

It is possible that you will need remote access to your data, if you are working from more than one location, or not at the university. A number of individuals may require access to the data, possibly with different privileges to read, write, update or delete. This may be

accomplished by keeping a copy of the data on the university shared network file store, where it is password protected. The use of cloud storage to share data depends upon the level of security needed.

It is possible that your project may need to arrange for access to third party data that may have specific limitations in how they can be distributed (based on IP or the agreement by which your project obtained the data). When your research project has received data under confidentiality or other restrictions you will have to identify and explain these restrictions in your data management plan.

Ownership of research data must be clarified prior to, or at the beginning of a project. Future storage and reuse are directly affected by the intellectual property rights of research data.

Ownership of the data and copyrighted datasets will depend on whether the project was created as part of sponsored research; the employment status of the creator; whether third-party data has been utilised during the conduct of research, and, in case of an ‘encoded work’ whether substantial university resources were used in the creation of the encoded work.

The conditions under which the data may be made available by the data repository to other researchers are determined by you as the Principal Investigator depositing the data. When depositing to 3TU.Datacentrum you can choose from two general access conditions:

Open access: there are no additional restrictions on access to the data or publication of results.

Embargo period: you can request that an embargo period be imposed on your data, whereby no access to the data would be permitted until after the date you specify.

5. Data Sharing and Reuse
Your research is valuable and important, and so is the data that it is based on. By publishing your data, you make it available to the scholarly community, who can study and build upon your work. Your work will become more visible and typically be cited more frequently.

At the end of your research project, your funder may require you to share your research data, by publishing it with no access restrictions (open access). Some journal publishers also require the data supporting the research article to be published.

When disseminating your data, you need to think about who would be interested in your research findings, and the way how to reach this audience (by newsletter, community website, press release, attending seminars or conferences, etc.)

You also may need to think about how you want others to reuse your data. If you want your data to be as widely used as possible, the Creative Commons Attribution Only licence (CC-BY), would be most useful. This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation.

Some data repositories have licences that depositors must grant as a condition of deposit. When depositing to 3TU.Datacentrum you must sign a licence agreement to establish the terms and conditions of use of your data collection. This is a legal document which sets out your rights and responsibilities as depositor and ours as the data distributor.

Depositors can elect to apply an embargo to the research data so that public access is deferred for a specific period (typically no more than two years). Embargo may be appropriate when the researcher needs to maintain the data in a managed repository environment, like 3TU.Datacentrum, while deferring any access to the data pending further data collection, analysis, publication of results, etc. If data are generated using specifically

developed software, it may be necessary to provide a copy of the software, noting operating requirements, with the data.

 

6. Data Preservation and Archiving

You need to decide, together with the project stakeholders, which data have to be preserved and be available after the end of the research. The same holds for related information like scripts, software, survey templates, documentation, informed consents, etc. The decision will be based on the standards of good scientific practice, legal and contractual regulations, funder requirements, but also on the type of data created, the value for reuse, and whether further work or publications will be based on it.

Data selected for long-term preservation will normally be submitted to a funder established data centre, disciplinary data repository or an institutional data repository. In the Netherlands you can choose 3TU.Datacentrum for the technical-scientific research data and DANS for data from research in the social sciences and humanities.

3TU.Datacentrum stores the data in a permanent and sustainable manner, according to the guidelines of the international Data Seal of Approval. Being a Trusted Digital Repository, 3TU.Datacentrum is demonstrating to researchers that it is taking appropriate measures to ensure the long-term availability and quality of data it holds.

If you need any help with depositing your data please contact 3TU.Datacentrum (datacentrum@3tu.nl) for assistance.

At DANS (Data Archiving and Networked Services) research data are sustainably stored (it also has the Data Seal of Approval) and shared via EASY, the DANS online archiving system. For help and more information you can contact info@dans.knaw.nl

Other data repositories can be found at Databib.

Most funders regard costs for archiving the data or preparing it for archive as allowable as long as they are justified and incurred within the life of the project. For funding in Horizon 2020, we strongly recommend to include these costs in Annex II of the application form.