IGS University of Twente

Data Storage

Storing your research data is important for several reasons. First of all, according to the Netherlands Code of Conduct for Scientific Practice (VSNU, part III), researchers are obliged to store their raw research data for at least ten years (no maximum period) for validation purposes. Secondly, journals or funders may require you to give open access to your research data or at least share your data with other researchers upon request (see Data Sharing).

Where to store

There are several options to store your research data. You can use the IGS-DVN to store your (raw) research data for the minimal ten-year period. After those ten years, the data will be deleted or removed to a long term data archive (like DANS-EASY) in consultation with the researcher.

Reasons to move the data to a data archive for long term preservation could be (derived from DANS, 2011):

-

Obligations: funder or journal requirements to make your research data openly available for re-use or validation purposes

-

Value: potential value of the research data, regarding quality, originality, size, scale, collection costs, innovation

-

Uniqueness: data exists of unique, non-repeatable observations

-

Importance for general historic research (heritage)

Important: datasets underlying one or more (scientific) publications should be archived directly at DANS-EASY to ensure permanent preservation. In this case, you should at least store the dataset(s) used for your final analyses and results, including syntaxes that can be used to replicate your results.

What to store

-

Raw data file: the raw data file contains the originally collected, unprocessed data.

-

Derived dataset: the derived dataset is the dataset underlying certain results or publications. You can derive different datasets from your raw data for different purposes.

-

Syntaxes: a syntax file contains the code, algorithms or commands used to create your derived dataset from your original, raw dataset. It also contains (stepwise) information about the transformations and analyses performed on the raw dataset.

-

Metadata file: a metadata file is a separate file attached to your dataset, which contains information about your dataset for future use (by yourself or others). For example, a metadata file should contain information on the following subjects: creator, access conditions, context, collection methods, time references, structure and organization of data files, variable names, labels and descriptions of variables and values, codes for missing values, file formats, and hard- and software used to process and analyse the data.

As common sense dictates, storing and sharing (sensitive) data should be handled with care (see Guidelines Personal Information). The level of precaution that should be taken depends on the sensitivity of the data, and can range from ‘simple’ precaution to storage on a secured, isolated and off-line computer or encrypted USB sticks in the IGS data vault.

Preferred file formats

To ensure long-term preservation that is independent of certain specific software, you are encouraged to save your files in commonly used and easily re-usable file formats with open documentation. Please find a list of different preferred and acceptable file formats for different types of data here.

Reproducibility

In general, any scientific work should be reproducible. This applies to the social sciences as much as it does to the natural sciences. In practice, this means that the whole process of how you handle data should be documented. Gathering, cleaning, coding, transforming and scaling as well as analyses performed should all be documented. It is good practice to perform the above tasks using syntax, and to store the syntax along with the data.

Note that, even though it may be tempting to perform a ‘quick fix’ in the SPSS data view, such a change may become lost or be overlooked, rendering reproduction of the research more difficult.