CTIT University of Twente
Research Business & Innovation About CTIT Research Calls Looking for a job? Intranet

FACT (NWO CATCH)

Folktales As Classifiable Texts

Project Number:

Project Manager: Prof. dr. Franciska de Jong

Faculty of Electrical Engineering, Mathematics and Computer Science

Tel.: +31-53-4894193

Email: f.m.g.dejong@utwente.nl

Project website:

Summary

The FACT project will study new possibilities for researchers from humanities disciplines (folktale and narratology researchers, documentalists, etc.) to explore folktales based on annotations and links generated by data-driven methods. To this end, FACT will develop software enabling the computer to automatically enrich a corpus of Dutch folktales with metadata such as names, genre, type, and a summary. In addition, FACT represents the first effort to systematically apply and evaluate various clustering techniques on a very large (40.000+) and diverse collection of folktales. The algorithms developed in the project will be integrated in a user-friendly platform that supports annotation as well as exploratory research into variability in oral and written transmission, using XML database technology to model all folktale data (both annotations and the text of the tale itself) in one unifying framework. A large part of the scientific research in FACT will deal with the pros and cons of human classification and computerized clustering to investigate variation in (oral) transmission. By using document clustering, we hope to discover relationships between documents that cannot be readily identified by human annotators. The main challenge will be to make the computer decide which texts are related and which are not. This is not a black-or-white issue: folktales may be related to each other on different dimensions and to varying degrees. Will the computer be able to recognize the cultural DNA of tales, and make a distinction between different types (no kinship) and versions of the same type (kinship)?

Project duration: 2011-2015

Project budget: 656 k-€ funding

Number of person/years: 4.4 fte / year

Project Coordinator: UT

Participants: UT, Meertens Instituut, University of Tilburg

Project budget CTIT: 656 k-€ funding

Number of person/years CTIT: 3.8 fte/year

Involved groups: Human Media Interaction (HMI), Databases (DB)