PF/Tijah

Features and Goals

PF/Tijah is a a text search system (Tijah) that is integrated with the Pathfinder (PF) XML database management system. PF/Tijah includes out-of-the-box solutions for common tasks like index creation, document management, stemming, and result ranking (supporting several retrieval models), but it remains the same time open to any adaptation or extension. On the one hand, the system aims to be a general purpose tool for developing IR end-user applications using XQuery statements with text search extensions. On the other hand, the system aims to be a playground for the information retrieval scientist and advanced user to easily set up and test new search systems. Advanced users can hook in the system at an intermediate level that provides the database scripting language MIL and several pre-defined operations on terms and XML elements called Score Region Algebra operators (See Publications). The PF/Tijah system has a number unique selling points that distinguish it from other information retrieval systems.

§

PF/Tijah supports retrieving arbitrary parts of the textual data, unlike traditional information retrieval systems for which the notion of a document needs to be defined up front by the application developer. For instance, if the data consist of scientific journals one can query for complete journals, journal issues, single articles, sections from articles or paragraphs with no need to adapt the index or any other part of the system configuration;

§

PF/Tijah supports complex scoring and ranking of the retrieved results by means of so-called NEXI queries. NEXI (See Publications) stands for Narrowed Extended XPath: a query language that only supports the descendant and the self axis step, but that is extended with a special about() function that takes a sequence of nodes and ranks those by their estimated probability of relevance to the query;

§

PF/Tijah supports ad hoc result presentation by means of its query language. For instance, when searching for a special issue of a journal, it is easy to print any information from that retrieval result on the screen in a declarative way (i.e., not by means of a general purpose programming language), such as the special issue title, its date, the editors and the preface. This is simply done by means of XQuery element construction;

§

PF/Tijah supports text search combined with traditional database querying, including for instance joins on values. For instance, one could search for employees from the financial department that also worked for the sales department and that sent an email about "tax refunds"