To what extent can the health data gathered through Citizen Science research be made available for reuse? UT researchers Ria Wolkorte, Michelle Kip and Lieke Heesink ran up against this question as they sought to make their data ‘FAIR’. They conducted focus group sessions with patients, a data steward, an ethicist, and a Citizen Science researcher in search of the answer. Patients support the re-use of data, providing there is a legitimate research question, Wolkorte explains. The researchers are currently working with the BMS data steward to develop a ‘Citizen Science for Health’ guideline.
Wolkorte and Kip (Health Technology and Services Research, BMS, TechMed Center) have been working with Heesink (Biomedical Signals en Systems, EEMCS) to study ‘living with rheumatoid arthritis’ as part of the TOPFIT Citizenlab research programme since 2020. Citizenlab offers citizens, scientists, and developers the opportunity to collaborate on technological healthcare innovations. Participants explore new solutions aimed at preventing health problems or minimising the impact of chronic diseases on patients.
Better research through Citizen Science
Health research is all about people, so it is crucial that their voices are represented in the research, Wolkorte believes. ‘Researchers already know a lot about chronic illness, but there's also still a lot we don't know... People who have arthritis know what it's like to live with the condition, the kinds of problems it involves, and know best which issues they'd like to see addressed. Citizen Science (CS) is a collaboration between researchers and citizens - patients in this case - and is helping to improve the quality of research. Both sides bring their own specific knowledge and expertise to the table. As a result, you achieve more than you would have been able to alone. CS is a new field, so we still have a lot to figure out, like how to manage the research data. Health research involves the use of privacy-sensitive data, which complicates reuse. That's why we're currently working with BMS data steward Qian Zhang to develop a Citizen Science for Health Guideline.’
What sort of issues would people with arthritis like to see explored?
Wolkorte and Heesink started by interviewing several patients, the ‘co-researchers’, to find out which topics people with arthritis would like to focus on. The list that resulted from these interviews was supplemented with topics they found in scientific literature. As it turned out, these patients were particularly interested in research on fatigue. The researchers conducted a group interview to find out which questions patients have about fatigue. After discussions with the co-researchers, a rheumatologist and after doing literature research, Wolkorte and Heesink wrote a research proposal that was then presented to 24 co-researchers. Among other information, the proposal detailed the data to be collected.
Making research data available for reuse
CS is based around the principle that research data should be made available for reuse. This involves making the research data FAIR: findable, accessible, interoperable and reusable. ‘Unfortunately, health research inevitably involves privacy issues, so that can be tricky,’ Wolkorte explains. ‘We were aware of the FAIR principle: “as open as possible, as closed as necessary”. But we didn't really have any idea what that meant in practice, even though we were already in the middle of the research process.’
Wolkorte and Kip were preparing to explore this question when they first heard about the option of funding through the Fair Data Fund in the spring of 2021. They submitted an application in order to thoroughly research the potential for FAIR and open data collection in CS projects.
Is the data anonymous enough?
They were awarded the Fair Data Fund grant and appointed a student assistant to help them make one of the datasets FAIR. Wolkorte. ‘Her work included anonymising the data and metadata, and translating the transcripts into English. We quickly ran into all sorts of questions... Was the data anonymous enough? Could a researcher do the translations or would this require a professional translator? Can we store the dataset in the 4TU.ResearchData Repository? What should be included in the informed consent statement? As it turned out, we were allowed to store the data in the repository. Still, it didn't feel right to us because we hadn't explicitly informed the patients in advance about the possibility of data reuse.’
Assistance from DCC data stewards
‘We asked the BMS data steward and DCC’s FAIR data steward Zafer Öztürk for help. They were extremely helpful, but they couldn't answer all our questions because CS is still so new. The data stewards put us in touch with data and privacy experts at the UT and within the 4TU Data Community. That was very useful, but it still didn't solve all our problems. We found that the FAIR principles are rooted in disciplines like ecology and biology, which are more concerned with quantitative data. You can't just directly apply the FAIR principles to health research, which also involves privacy issues and qualitative data.’
Wolkorte and Kip were left with some unresolved questions. They decided to conduct focus groups among people with arthritis and experts in order to develop an optimal data processing approach for future Citizenlab projects. A total of two focus groups were conducted, attended by four people with arthritis, a CS researcher, an ethicist, and a data steward.
Data access conditional on legitimate research questions
As it turned out, patients supported the reuse of research data. Wolkorte: ‘They appreciated the need to store data in a repository so that it can be retrieved and reused more efficiently, as this reduces the burden of research on citizens and saves money. They also felt it was important that people could see how the research was conducted. However, they only wanted to see the metadata - i.e. a description of the data – to be made available open access rather than the actual research data. These open metadata make research data findable. Patients felt that research data could be re-used by other researchers provided they had a legitimate research question. They are happy to let us, the researchers, determine this legitimacy. The patients also felt it was very important to inform participants in advance that the data would be stored in a repository and could potentially be reused.’
Anonymising research data
The focus group members also felt strongly that the data should be anonymised. This means any personal data, such as names, email addresses, and other identifiable information must be deleted or replaced with, for example, a number or general description. Wolkorte: ‘Thorough anonymisation is also crucial when it comes to qualitative data. Based on the focus group outcomes, we eventually decided that any health-related data we collect should be stored in the 4TU Data Repository with restricted access. That means it can only be accessed with the approval of the researchers.’
Citizen Science for Health guideline
Wolkorte and Kip are currently working with the BMS data steward to develop a Citizen Science for Health guideline. ‘So far, we've only spoken to a small group of patients. That's why we want to survey a larger group of people with arthritis (and possibly also people with other conditions) in order to ensure a broader base of evidence. We want to determine the preconditions for sound CS research in the field of health sciences, and are currently discussing the most optimal approach with the data steward. Our advice for other CS researchers? Carefully define what you want to research and how you aim to investigate this in advance, and discuss this with your participants; if you don't specifically ask them beforehand under what conditions their data may or may not be re-used, you won't be able to share it with other researchers later on. Informed consent is key!’
The UT Digital Competence Centre is a university-wide network of specialists (data stewards, information specialists, IT account managers) and offers UT researchers support and tools for: FAIR data management, Open Access and Open Science: DCC - Open Science, Research Data Management, ICT for Research (utwente.nl)