Master Assignment
Large-scale data mining & NLP @ OCLC - Leiden, NL
Type: Master CS
Period: TBD
Student: (Unassigned)
If you are interested please contact :
OCLC is a global library cooperative that provides shared technology services, original research and community programs for its membership and the library community at large. Collectively with member libraries, OCLC maintains WorldCat, the world’s most comprehensive database of information about library collections. WorldCat now hosts more than 460 million bibliographic records in 483 languages, aggregated from 18,000 libraries in 123 countries.
As the WorldCat continues to grow in quantity, OCLC is actively exploring data science, advanced machine learning, linked data and visualisation technologies to improve data quality, transform bibliographic descriptions into actionable knowledge, as well as provide more functionalities for professional cataloguers and develop more services for end users of the libraries.
OCLC is constantly looking for students who are enthusiastic to advance AI technologies for library and other cultural heritage data. Examples of student assignments are:
- Fast and scalable semantic embedding for information retrieval
- eXtreme Multi-label Text Classification (XMTC) for automatic subject prediction
- Automatic image captioning for Cultural Heritage collections
- Entity extraction and disambiguation
- Entity matching across different media (e.g. books, articles, cultural heritage objects, etc) or across languages
- Hierarchical clustering of bibliographic records
- Constructing knowledge graphs around books, authors, subjects, publishers, etc.
- Interactive visualisation of library data on geographic maps and/or along a time dimension
- Concept drift (i.e., how meaning changes over time) and its effects on information retrieval
- Scientometrics-related topics based on co-authoring networks and/or citation networks