Access to CBS microdata

Access to CBS microdata

Data Management Plan (info)

Name of student/researcher(s):

Name of supervisors/(co-) promotors:

Name of group/project:

Description of your research:
Briefly summarise the type of your research to help others understand the purposes for which the data are being collected or created.

Funding body(ies):

Grant number:
A grant number provides unique identification for the grant.

Partner organisations:

Project duration: Start: MM-DD-YYYY End: MM-DD-YYYY

Date written: MM-DD-YYYY

Date last update:

Version:
A new version of the DMP should be created whenever important changes to the project occur due to inclusion of new data sets, changes in consortium policies or external factors.

Name of researcher(s), the institution/group and their roles/responsibilities for data management:
Name anyone and their specific roles and responsibilities for data management (data collection, data analysis, data storage, etc. as well as data ownership).

Laws, policies, contracts and agreements to comply with:
When applicable, name the laws, data policy documents and contracts or agreements to which this DMP must comply.

General information (info)

1. Data Collection (info)

Describe the data you will collect.

Checklist:

  • How will data be collected?
  • Will you also use pre-existing data? From where?
  • What type of data will be collected? (measurements, observations, questionnaires, models, etc.)
  • In what file formats?
  • Which tools or software are needed to create, process and/or visualize the data?
  • Do the data have a specific character in terms of reproducibility, confidentiality (e.g. privacy, see next question), etc.? What does this mean for the management of the data?
  • Do you collect personal data in terms of the Dutch Data Protection Act? According to the Code of Conduct for the use of personal in academic research, is it necessary to report this to the Autoriteit Persoonsgegevens (former College Bescherming Persoonsgegevens)? (See guidance below).
  • What is the estimated total size of the data, and what growth rate? What is the estimated number of files and the maximum file size?
  • How do you handle version control to maintain all changes that are made to the data?

If desired you can use the following table for describing your data:


Type of data

Format

Software

Data size/growth

Specific character





















2. Data Storage and Back-up (info)

Ensure that during your research all research data are stored securely and backed up or copied regularly.

Checklist:

  • How will the raw data, processed data, models/codes, informed consents, etc. be stored and backed up during the research?
  • Which storage medium will you use for your storage and backup strategy? Network storage; personal storage media (CDs, DVDs, USBs, portable hard drives); cloud storage
  • Are backups so that you can restore in the event of data loss? What is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) of the storage medium?
  • Are the data backed up at different locations?

If desired you can use the following table:

Data

Storage medium and location

Backup location, frequency, RPO and RTO

Raw data



Processed data



Models/code



Informed consents



Other?



3. Data Documentation (info)

Document your data to help future users to understand and reuse it.

Checklist:

  • What standards will be used for documentation and metadata? If there is not a standard already available for your data, outline how and what metadata will be created.
  • How will your data be documented during your research and for long-term storage?
  • What directory and file naming convention will be used to enable the titling of your folders, documents and records in a consistent and logical way?
  • What project and/or data identifiers will be assigned? (e.g. DOI or Digital Object Identifier)

4. Data Access (info)

Describe how authorized access to the data is managed both during and after the research.

Checklist:

  • How will you manage copyright and Intellectual Property Rights issues? E.g. Who owns the data? How will the data be licensed for reuse?
  • Are there any limitations on the access of your data?
  • What are the access criteria for the data (open/restricted access, embargo period, etc.)?
  • Who controls data access (e.g. PI Principal Investigator, student, lab, university, funder)?

5. Data Sharing and Reuse (info)

Describe how your data can be shared and reused.

Checklist:

  • What expectations there are for reuse of your data?
  • Can you identify the audience for reuse? Who will use it now? Who will use it later?
  • What agreements or requirements there are about data sharing (e.g. funder data sharing policy, commercial partner agreement)?
  • If you allow others to reuse your data, how will the data be shared? In case the dataset cannot be shared, the reasons for this should be mentioned (e.g. ethical, rules of personal data, intellectual property, commercial, privacy-related, security-related).
  • When will you publish your data and where? Will they be linked to one or more scientific publications?

6. Data Preservation and Archiving (info)

Describe which data will be preserved after the research and where and how these will be archived.

Checklist:

  • Which criteria will you use to decide which data has to be archived for preservation and long-term availability. Which data has to be destroyed?
  • How long should it be preserved (e.g., 3-5 years, 10-20 years, permanently)?
  • What file formats will be used for long-term preservation and availability?
  • Which data repository is appropriate for archiving your data (4TU.ResearchData, DANS, subject-based data repository)?
  • What are the estimated total costs for archiving the data in the selected repository?

Guidance

Data Management Plan

This template should be regarded as a checklist of most important questions when formulating a data management plan. Be aware of specific issues which might be relevant in your research but are not or only briefly touched in this template.

The template may be appropriate for Principal Investigators collaborating in a research project or for research students working on a PhD or Masters project.

The Data Management Plan consists of a page for general information and 6 sections. Detailed information on each section will show up when you ‘open’ the info button.

General information

Apart from general information about the research, researcher, partners, funding agency, etc. pay attention to the question about roles and responsibilities. This can refer to the different parts of the DMP, like data collection or data access, but also to checking the quality of data and data documentation. Also think about the roles and responsibilities of monitoring the DMP implementation and the DMP compliance with laws, policies, etc. Ask whether there is information about this in your group, institute or faculty.

Make distinction between who should perform a certain action and who checks the results of these actions. Moreover keep in mind the distinction between data management actions during and after the project.

Have a closer look at the ownership of the data, or in most cases more important, the exploitation rights. There might be policies, contracts or agreements about this. If not, discuss this issue with involved persons.

Mention all official laws, policies, contracts and/or agreements which are relevant to the research and say something about the data to be collected and/or used in the research project and how these should be handled.

1. Data Collection

An accurate description of the data to be collected is important as a basis for good data management.

Data types

There are four main types of research data:

  • Observational data: captured in real time, typically cannot be reproduced exactly
  • Experimental data: from labs and equipment, can often be reproduced but may be expensive to do so
  • Simulation data: from models, can typically be reproduced if the input data is known
  • Derived or compiled data: after data mining or statistical analysis has been done, can be reproduced if analysis is documented

Data types can include text, numbers, images, 3D models, software, audio files, video files, reports, surveys, etc.

Also take into account related material (used for or during collection, processing and/or analysis) which should be managed together with the data like scripts, questionnaire forms, informed consents, etc.

Provide information on the existence (or not) of similar data and the possibilities for integration and reuse.

Personal data

In case you collect personal data, this type of data concerns all “data on identified or identifiable natural living persons” (Dutch Personal Data Protection Act). In some cases you need to report this to the Autoriteit Persoonsgegevens (former College Bescherming Persoonsgegevens).

There are different protection regimes, dependent on the type of personal data. There is stricter regime for special personal data, like sexual orientation, religion, criminal record, political affiliation. Even more strict protection is required for patient- and BSN (citizen service number) -data. See for more information: Code of Conduct for the use of personal data in academic research.

Before the start of collecting personal data there is basically always informed consent needed for storing, processing or publishing.

File formats

In planning a research project, it is important that you consider which file formats you will use to store your data. In some cases, this will be dictated by the software you are using or the conventions of your discipline, but in other cases you may have to make a choice between several options. These are likely to be some of the key factors in your decision-making:

  • what software and formats you or colleagues have used in past projects,
  • any discipline-specific norms (and any peer support that comes with them),
  • what software is compatible with hardware you already have,
  • whether you have funding for new software for the job,
  • how you plan to analyse, sort, or store your data.

But you should also consider:

  • what formats will be easiest to share with colleagues for future projects,
  • what formats are at risk of obsolescence, because of new versions or their dependence on particular software and/or hardware,
  • what formats it will be possible to open and read in the future,
  • what formats will be easiest to annotate with metadata so that you and others can interpret them days, months, or years in the future.

In some cases you may be best off using one format for data collection and analysis and converting your data to a standard format for archiving once your project is complete. After conversions, data should be checked for errors or changes that may be caused by the export process.

Version control

Because digital research data can so easily be copied, over-written or changed, researchers need to take steps to protect its authenticity. Research time is wasted and valuable data put at risk if researchers work with outdated versions of files.

Version control can prevent this. Control is particularly important if data is being used by multiple members of a research team, or if research files are shared across different locations.

A regime to synchronize different copies or versions of files will improve research efficiency and help guarantee the authenticity of the data. Good practice generally involves the keeping of a single master file, to which all changes are recorded. Version control mechanisms should be established and documented before any data is collected or generated.

Read more about version control at: http://www2.le.ac.uk/services/research-data/organise-data/version-control
There are open software tools for version control. See: http://www.unmc.edu/vcr/rito/services/version-control-handout.pdf

2. Data Storage and Back-up

It is the responsibility of the researcher to ensure that their research data and related information like scripts, software, survey templates, documentation, informed consents, etc. is stored securely regularly backed-up for the life of the project. It is good practice to store only what you need to keep and keep at least three copies of crucial data. It is recommended that data is stored on the university’s networked fileservers and copies kept on remote storage and/or portable storage.

Generally there are four options for data storage:

  • Networked drives: University fileserver As these are secure and backed-up regularly, they are ideal for master copies of your research data.
  • Local drives: PCs and Laptops – Data can be lost because local drives can fail, or the computer may be lost or stolen. These are convenient for short-term storage and data processing but should not be relied upon for storing master copies, unless backed-up regularly.
  • Remote or Cloud storage – commonly used services, such as Dropbox and Google Drive, will not be appropriate for sensitive data, and their service level agreements should be studied before using them to store your research data.
  • External portable storage devices – External hard drives, USB drives, DVDs and CDs. These are very convenient, being cheap and portable, but not recommended for long-term storage as their longevity is uncertain and they can be easily damaged.

You will find an overview at www.utwente.nl/rdm.
For more information about the networked and local drives, see:
https://www.utwente.nl/en/service-abc/!/product/p883320/backups-for-employees under ‘Storage, backup and restore of data, for email, M:, P:, U:, institution systems and research data’.

You may choose to only back up certain data, or to back up files you use every day more regularly than others. The basic rule of thumb is: The more important the data and the more often they change, the more regularly they need to be backed up.

If your files take up a large amount of space and backing up all of them (or backing them up sufficiently frequently) would be difficult or expensive, you may want to focus on backing up specific key information, programs, algorithms, or documentations that you would need in order to re-create the data in case of data loss.

For more information about backups, RPO and RTO, see:
https://www.utwente.nl/en/service-abc/!/product/p883320/backups-for-employees

3. Data Documentation

Describe the types of documentation that will accompany the data to help secondary users to understand and reuse it. This should at least include basic details that will help people to find the data, including who created or contributed to the data, its title, date of creation and under what conditions it can be accessed.

Documentation may also include details on the methodology used, analytical and procedural information, definitions of variables, vocabularies, units of measurement, any assumptions made, and the format and file type of the data. Consider how you will capture this information and where it will be recorded. Wherever possible you should identify and use existing community standards. See: http://www.dcc.ac.uk/resources/metadata-standards

File naming

Organising your files and folders effectively and efficiently can save you time and make collaboration easier by ensuring you are working on the correct version of the data. A good file name makes it easy to identify, locate and retrieve your data. There is no one recommended way to name your files and folders, but you should name your files consistently. If you work as part of a research group, you should decide on a file and folding naming system with your colleagues. See for practical information: http://guides.lib.purdue.edu/c.php?g=353013&p=2378293

Identifiers

An identifier is a reference number or name for a data object and forms a key part of your documentation and metadata. To be useful over the long-term, identifiers need to be unique (globally unique if possible) and persistent (the identifier should not change over time).

The emerging identifier standard for publicly available datasets is the Digital Object Identifiers (DOIs). Although DOIs have been traditionally used for journal articles, they can now be assigned to datasets. 4TU.ResearchData (more information see section 6: Data preservation and archiving) will automatically assign a DOI to a dataset that you make available.

4. Data Access

Ownership of research data must be clarified prior to, or at the beginning of a project. Future storage and reuse are directly affected by the intellectual property rights of research data.

Ownership of the data and copyrighted datasets will depend on whether the project was created as part of sponsored research; the employment status of the creator; whether third-party data has been utilised during the conduct of research, and, in case of an ‘encoded work’ whether substantial university resources were used in the creation of the encoded work.

During the research project you will want to keep your research data safe and secure. You will want to determine who has access to your data and what they are authorised to do with it. Data security is needed to prevent unauthorised access or disclosure and changes to or destruction of data. The principle investigators are responsible for ensuring data security. The level of security required depends upon the nature of the data – personal or sensitive data need higher levels of security (see above: data collection).

It is possible that you will need remote access to your data, if you are working from more than one location, or not at the university. A number of individuals may require access to the data, possibly with different privileges to read, write, update or delete. This may be

accomplished by keeping a copy of the data on the university shared network file store, where it is password protected. The use of cloud storage to share data depends upon the level of security needed.

It is possible that your project may need to arrange for access to third party data that may have specific limitations in how they can be distributed (based on IP or the agreement by which your project obtained the data). When your research project has received data under confidentiality or other restrictions you will have to identify and explain these restrictions in your data management plan.

The conditions under which the data may be made available by the data repository to other researchers are determined by you as the Principal Investigator depositing the data. When depositing to 4TU.ResearchData you can choose from two general access conditions:

  • Open access: there are no additional restrictions on access to the data or publication of results.
  • Embargo period: you can request that an embargo period be imposed on your data, whereby no access to the data would be permitted until after the date you specify.

5. Data Sharing and Reuse
Your research is valuable and important, and so is the data that it is based on. By publishing your data, you make it available to the scholarly community, who can study and build upon your work. Your work will become more visible and typically be cited more frequently.

At the end of your research project, your funder may require you to share your research data, by publishing it with no access restrictions (open access). Some journal publishers also require the data supporting the research article to be published.

When disseminating your data, you need to think about who would be interested in your research findings, and the way how to reach this audience (by newsletter, community website, press release, attending seminars or conferences, etc.)

You also may need to think about how you want others to reuse your data. If you want your data to be as widely used as possible, the Creative Commons Attribution Only licence (CC-BY), would be most useful. This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation.

Some data repositories have licences that depositors must grant as a condition of deposit. When depositing to 4TU.ResearchData you must sign a licence agreement to establish the terms and conditions of use of your data collection. This is a legal document which sets out your rights and responsibilities as depositor and ours as the data distributor.

Depositors can elect to apply an embargo to the research data so that public access is deferred for a specific period (typically no more than two years). Embargo may be appropriate when the researcher needs to maintain the data in a managed repository environment, like 4TU.ResearchData, while deferring any access to the data pending further data collection, analysis, publication of results, etc. If data are generated using specifically

developed software, it may be necessary to provide a copy of the software, noting operating requirements, with the data.

6. Data Preservation and Archiving

You need to decide, together with the project stakeholders, which data have to be preserved and be available after the end of the research. The same holds for related information like scripts, software, survey templates, documentation, informed consents, etc. The decision will be based on the standards of good scientific practice, legal and contractual regulations, funder requirements, but also on the type of data created, the value for reuse, and whether further work or publications will be based on it.

Data selected for long-term preservation will normally be submitted to a funder established data centre, disciplinary data repository or an institutional data repository. In the Netherlands you can choose 4TU.ResearchData for the technical-scientific research data and DANS for data from research in the social sciences and humanities.

4TU.ResearchData stores the data in a permanent and sustainable manner, according to the guidelines of the international Data Seal of Approval. Being a Trusted Digital Repository, 4TU.ResearchData is demonstrating to researchers that it is taking appropriate measures to ensure the long-term availability and quality of data it holds. If you need any help with depositing your data please contact 4TU.ResearchData (researchdata@4tu.nl) for assistance.

At DANS (Data Archiving and Networked Services) research data are sustainably stored (it also has the Data Seal of Approval) and shared via EASY, the DANS online archiving system. For help and more information you can contact info@dans.knaw.nl

Other data repositories can be found at Databib.

Most funders regard costs for archiving the data or preparing it for archive as allowable as long as they are justified and incurred within the life of the project. For funding in Horizon 2020, we strongly recommend to include these costs in Annex II of the application form.