‘Research data should be findable, understandable and open for others to use for new research’, says Mostafa Daoud, who is a PhD candidate in the Water Resources department at ITC. When his Master’s thesis was published as an article he published the underlying data in the national DANS data repository. The data can now be found, verified and downloaded by others. ITC Data Steward, Alice Nikuze, advised Daoud during the preparations and did a final check before he uploaded the data to DANS. What were the lessons learnt?
The topic of Daoud’s Master’s thesis project at ITC was ‘surface - groundwater interactions in a hard rock system water-limited environment in Spain.’ ‘We used and combined new methods and introduced a new concept model to simulate surface - groundwater interactions.’
Two types of secondary data
Daoud used two types of secondary data for his research. ‘In this study area in Spain measurements have been performed by scientists for twenty years and are still ongoing. There is a network of loggers that record different variables such as groundwater levels, stream flow, rainfall, air temperature, relative humidity, et cetera. My PhD supervisor, Associate Professor Maciek Lubczynski, was engaged in earlier research projects and therefore has access to these data. My second data source were open-source remote sensing satellite images. These give us information about for example land cover, land surface temperature and vegetation indices. We could retrieve these data from open-source platforms such as NASA Landsat Images and the European Space Agency Copernicus.’
Organizing folders and giving files meaningful names
Daoud’s first step after receiving the research data from his supervisor and downloading the satellite data, was organizing the data in folders and subfolders for distinct types of data (e.g., raw and processed data) and giving the data folders and files meaningful names. ‘Before starting our Master’s at ITC we had to follow a course in academic skills and data management was part of that. We were taught about organizing and naming the folders according to data types: raster images, sheets, maps, layers, et cetera. And we learnt that the names of the files should include relevant information such as: the name of the image itself, the time, the date, the resolution and processing name.’
Researchers should be able to build upon each other’s work
‘When my supervisor asked me to publish the data related to the article I found this a good idea, because I believe research should be open. Researchers should be able to build upon each other’s work. It is a waste of time and money if researchers must start from scratch all the time.’ Before doing his Master’s Daoud was working in another organization and had the frustrating personal experience of not being allowed to re-use a dataset. ‘So, I really understand the need to share data and have open-source data.’
Advice from the Data Steward
‘I asked some of my ITC colleagues which platform I could best use for publishing the data. They recommended the DANS repository because of its user-friendliness.’ It is indeed easy to use, says Daoud. ‘When you start the process of preparations for uploading the data you get all kinds of useful questions that make you think of all the important aspects of making the data understandable and re-usable for others. It includes for example a section about adding metadata and supporting data. Whenever I had questions or encountered problems, I could ask ITC’s data steward Alice Nikuze for advice. She also offered to check the file before the final upload.’’
The metadata should be filled in in such a way that the user can get a good general impression of the content of the dataset in a fast way. Supporting data is information that is too specific or too detailed for the metadata but that is needed to make the data understandable to users who have no previous knowledge of the data set. For example, a description of the research method, the codebook, a notebook and READ ME files.
Open-source formats recommended
It so happened that just before Daoud was going to upload the data in DANS, Nikuze was giving an interactive data management workshop as part of the ITC PhD program. ‘In this workshop I discovered that when naming my data files, I had not been using the ITC Guideline for naming data files. So, I found out I had to change the names of some folders. Another lesson that I learnt in this workshop was about interoperability. I have a lot of Microsoft Excel Sheets which I was about to upload to DANS but in the workshop the Data Steward explained that this is not recommended because during the uploading Excel files can easily be damaged. She advised us to make copies of the Excel files in text format. She recommended to use an open-source format as easy and light as possible.’
Creative Commons Non-Commercial license
Because he wants research data to be publicly available Daoud has published the dataset under the license Creative Commons Non-Commercial (CC BY-NC), he says. ‘And not only the research data should be openly available but also information about the method(s) used. My advice is to also upload a copy of the model that was used. This is important because science is developing fast. Every year new models come out with new capabilities, but you can not always use these to replicate the research with the dataset. So, I always upload the model version that I have used alongside the dataset.’
Publishing and archiving
In April 2022 Daoud uploaded the data to DANS after the Data Steward did the last check. ‘She made some recommendations which I processed and then I published it. When people find my dataset in DANS they can now download and use it without having to contact me. But if they have questions or want to contact me, they can.’ Daoud archived the data form his Master’s thesis project on the ITC data server of which a back-up is made every day and has them stored on a hard disc as well.
Research Data Management workshop for PhD candidates
Daoud has recently started a PhD at ITC and followed the Research Data Management Bootcamp for PhD candidates. ‘I think this first year PhD workshop is valuable because it is very, very important to learn about good research data management before you start your research. Otherwise, you find out about the importance of good filing, naming, interoperability, halfway and it will be a real mess.’ Daoud has now written a Data Management Plan with UT’s DMP tool. ‘I have used the information I received in the RDM workshop for PhD’s. I have already archived part of my PhD data on the ITC data server as well. Next year I will also start collecting data by myself so then I will have to consider other, new aspects of research data management.’
ITC Data Guidelines
ITC has developed guidelines on the basis of University of Twente’s Research Data Management Policy
DANS is the Dutch national centre of expertise and repository for research data. With more than 180,000 datasets and a staff of sixty, DANS is one of the leading repositories in Europe.
Digital Competence Network (DCC) – Research Data Management support
Do you have questions about storing, publishing, archiving, sharing (etc.) research data? Ask your faculty’s Data Steward. You can see who is your faculty’s Data Steward at the DCC website, go to: ‘Consult a specialist.’ All data stewards are part of the DCC Network an innovative university-wide network of experts on: Open Science, Research Data Management, ICT for Research (infrastructure, tools, software) & Digitalization of Science.
DCC - Open Science, Research Data Management, ICT for Research (utwente.nl)