Metadata, Metadata standard & Documentation


Metadata is "data about data". It is the information that describes your data and makes your data discoverable, understandable, and (re)useable. Data without metadata is virtually useless. Here is an example dataset without metadata, can you understand it and reuse it considering this is relevant to your research?

The answer is probably NO! What if you also get the following information of this dataset, does this increase the possibility for you to understand the data?

This additional information is metadata. Common metadata elements include Title, Author, Date, Subject, Unique Identifier, Abstract, Publisher, Rights, etc.
Further explanation with examples on metadata can be found in this video which is made by Utrecht University.

Metadata standard

Metadata standards or schemas consist of specific metadata elements and formats to describe or document your data. By referring to or using a specific standard, you don’t need to give an extra explanation to your metadata, because a standard normally uses controlled vocabulary for its metadata fields and each metadata element is well defined in a specific standard. For example, when you use the metadata element ‘Location’, without further definition it can mean a city, a country, or coordinates on earth. Therefore, to get others to understand your data is like communicating with others in daily life - You must speak the same language!

Dublin Core is one of the widely used, generic and easy-to-use metadata standards. It provides the basic information to describe your data. The data repository DANS EASY uses Dublin Core standard for the metadata of a dataset. If you will deposit your data to DANS EASY, you will be asked to provide metadata based on this standard. To generate metadata using Dublin Core by yourself, you can use this Dublin Core generator.

General metadata standards are not always sufficient to describe domain-specific research data. Therefore, disciplinary metadata standards are used to meet requirements from specific disciplines. DDI (Data Documentation Initiative) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioural, economic, and health sciences.

Using DDI standard to document your research

DDI is a free standard that can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use by people, software systems, and computer networks. There are lots of tools available to implement this standard at different stages of the research data lifecycle.

Here we show an example about using the free software Colectica to implement the DDI standard for documenting datasets in Excel.

Using this software, you could create general metadata for individual datasets as well as an additional explanation for each of the variables in the dataset. The metadata created from Excel can be exported as a machine-actionable DDI standard *.xml file, a rich text format word file, or a pdf file. Below is the metadata (in word format) about the content of the dataset and the meaning of each variable within the dataset. These metadata tables are created with Colectica for the example dataset which was shown at the beginning of this document:

How to create such a metadata file?

Here are the steps:
First of all, you need to download the free software Colectica and get an account;
After installing the software to your computer, you need to open a dataset in Excel, then go to the ribbon tab [Colectica] and click the [Document Workbook] button. After clicking the [Document Workbook] button, Colectica will embed information in your workbook to document your data file and each column. You can add more detailed information by following the steps explained.


BMS Datasteward Dr. Qian Zhang supports you with your questions on metadata.