[M] Attribute extraction and linking for medical concepts

Master Assignment

Attribute extraction and linking for medical concepts

Type: Master CS

Location: CTcue

Period: TBD

Student: (Unassigned)

If you are interested please contact :

Introduction of company:

CTcue is a small company that builds a search engine for patient populations and provides an easy way to collect data about that population. Some examples for use cases for hospitals are clinical trials and Quality indicators (statistics about hospital performance for insurance companies, government etc.). A major challenge is that valuable information is archived as text (e.g. reports or notes), making it unavailable for analysis without using natural language processing. We have created a pipeline that analyses dutch medical text, such that the full scope of patient health records can be used, both structured and unstructured data. The steps in the pipeline include (but are not limited to): measurement extraction, concept extraction, context classification, temporal classification. Currently our solution is implemented in 25 hospitals. The company consists of a team of 14 people and is located in Amsterdam on Science Park.

Project description:

Medical dossiers of patients consist for a large part of natural language. These letters and notes tend to contain a lot of information which is not registered in a structured way. CTcue has developed a pipeline that extracts the medical concepts from natural language. There is however a lot of information about these concepts that provide additional relevant information. An example can be the severity of a tumor, its location and its diameter. These are often found throughout the texts and extracting and linking this information to the appropriate concepts would enhance the usefulness of information stored in natural language greatly.

Expected product:

A python module that can take a tokenized text with tagged concept as input, which then tags all attributes in the text and links them to the related concepts.

Available resources:

A annotated Dutch dataset can be made available at CTcue in combination with large amounts of natural language data for unsupervised approaches. The competition from CLEF ehealth has had related tasks with datasets and resulting papers in 2014-2016 for inspiration.