Open Master Assignments | [M] Large-scale data mining & NLP @ OCLC - Leiden, NL

Master Assignment

Large-scale data mining & NLP @ OCLC - Leiden, NL

Type: Master CS

Period: TBD

Student: (Unassigned)

If you are interested please contact :

OCLC is a global library cooperative that provides shared technology services, original research and community programs for its membership and the library community at large. Collectively with member libraries, OCLC maintains WorldCat, the world’s most comprehensive database of information about library collections. WorldCat now hosts more than 460 million bibliographic records in 483 languages, aggregated from 18,000 libraries in 123 countries.

As the WorldCat continues to grow in quantity, OCLC is actively exploring data science, advanced machine learning, linked data and visualisation technologies to improve data quality, transform bibliographic descriptions into actionable knowledge, as well as provide more functionalities for professional cataloguers and develop more services for end users of the libraries.

OCLC is constantly looking for students who are enthusiastic to advance AI technologies for library and other cultural heritage data. Examples of student assignments are:

Fast and scalable semantic embedding for information retrieval
eXtreme Multi-label Text Classification (XMTC) for automatic subject prediction
Automatic image captioning for Cultural Heritage collections
Entity extraction and disambiguation
Entity matching across different media (e.g. books, articles, cultural heritage objects, etc) or across languages
Hierarchical clustering of bibliographic records
Constructing knowledge graphs around books, authors, subjects, publishers, etc.
Interactive visualisation of library data on geographic maps and/or along a time dimension
Concept drift (i.e., how meaning changes over time) and its effects on information retrieval
Scientometrics-related topics based on co-authoring networks and/or citation networks