[M] Combining structured and unstructured data

Master Assignment

Combining structured and unstructured data

Type: Master CS

Location: CTcue

Period: TBD

Student: (Unassigned)

If you are interested please contact :

Introduction of company:

CTcue is a small company that builds a search engine for patient populations and provides an easy way to collect data about that population. Some examples for use cases for hospitals are clinical trials and Quality indicators (statistics about hospital performance for insurance companies, government etc.). A major challenge is that valuable information is archived as text (e.g. reports or notes), making it unavailable for analysis without using natural language processing. We have created a pipeline that analyses dutch medical text, such that the full scope of patient health records can be used, both structured and unstructured data. The steps in the pipeline include (but are not limited to): measurement extraction, concept extraction, context classification, temporal classification. Currently our solution is implemented in 25 hospitals. The company consists of a team of 14 people and is located in Amsterdam on Science Park.

Project description:

When extracting information from medical text a lot of the information is already available in a structured way. A medication that is mentioned in text might or might not already be recorded in the structured medication table of the hospital data. When text extraction results can be linked to structured information this creates stronger evidence, and can differentiate new information from already present information. This kind of integration between different types of data can vastly improve the digital model of a patient, leading to faster collection of reliable data about patient populations.

Expected product:

A python module that for each piece of extracted information can decide if it is new, or linke it to existing structured data.

Available resources:

There is no tagged data set available for this problem, requiring a unsupervised approach to tackle the problem.