Courses

Summer school: Search Engine Technology

The internet would be useless without search engines. Most of us use search engine technology every day: a quick Google search, setting the destination of your Tom Tom, but also Apple Siri would not be possible without search engines.

Course Aim

In this course we will explore the world of search engines. You will learn how search engines work, what challenges they deal with, and how their performance can be measured. And even beter: you will be guided in building, evaluating, and improving your own search engine on areal-world dataset.

Topics

Information Retrieval is the scientific discipline behind search tools. This CuriousU course provides the foundations of Information Retrieval by addressing concepts like indexing, matching, and relevance. The course discusses fundamental approaches to Information Retrieval, such as ranking, controlled versus uncontrolled terms, query by example, and relevance feedback. It discusses mathematical models of Information Retrieval such as boolean retrieval, probabilistic retrieval, language models, logical models, Google's PageRank, and learning-to-rank; and it addresses applications of Information Retrieval like multimedia retrieval and of course web search engines.

Please note: we strongly advise you to use your own device (Laptop) for this course

Instructional modes

During the practial sessions we will build a search engine for product data using ElasticSearch, an open source search library. Some programming knowledge is required; we will be working in Python, but if you have programmed in another language you should be fine.

Learning outcomes

  • Explain basic concepts of Search Engine Technology, such as indexing, matching, and relevance
  • Explain and apply different approaches to search including exact matching, ranking, query by example, and relevance feedback
  • Apply mathematical models of Information Retrieval (Boolean retrieval, probabilistic retrieval, language models, logical models, Google's PageRank, etc.)
  • Set up a Search Engine using open source software
  • Carry out an experimental evaluation of a Search Engine and compute evaluation measures

Day schedule

This schedule is still under construction!

Day 1

  • Lecture (1h.): Welcome & Introduction to Search
  • Discussion (0.5h.) Introduce yourself + personal learning objectives
  • Exercise (1.5h.) Git version management + Get the MyDatafactory product data
  • Lunch brake
  • Tutorial (1h.): Elastic Search & restful web services
  • Exercise (2h.): Install and Run Elastic Search

Day 2

  • Lecture (1.5 h.): Indexing, a conceptual view point
  • Exercise (1.5h.): Analyze matching problems in the data
  • Lunch brake
  • Tutorial (1h.): Programming Elastic with Python
  • Exercise (1.5h.): Index the collection
  • Discussion (0.5h.): Discuss matching problems

Day 3

  • Lecture (1.5h.): Evaluation of search engines & Empirical Research
  • Exercise (1.5h.): Create a "run", an experiment file
  • Lunch brake
  • Tutorial (1h.): Elastic Search: nuts & bolds
  • Exercise (1.5h.): Calculate evaluation metrics
  • Discussion (0.5h.): Discuss the baseline evaluation results

Day 4

  • Lecture (1.5h.): Models of Information Retrieval
  • Exercise (0.5h.): The QUIZ
  • Exercise (1h.): Improve your system using different models
  • Lunch brake
  • Tutorial (1h.): Elastic Search: nuts & bolds
  • Exercise (1.5h.): Improve your system using different indexers
  • Discussion (0.5h): Discuss improvements over the baseline

Day 5

  • Lecture (1.5h.): Machine Learning (ML) for search engines
  • Exercise (1.5h.): Generate pair-wise training data for ML
  • Lunch brake
  • Tutorial (1h.): Rank SVM
  • Exercise (1.5h.): Improve your system using ML
  • Discussion (0.5h): Discuss improvements from ML

Day 6

  • Lecture (1h.): How to build Google in 1.5h?
  • Exercise (1h.): Estimate Google's index size and search speed
  • Discussion (1h.): Your plan to improve product search
  • Lunch brake
  •  Tutorial (1h.): Elastic Search for experts
  • Exercise (2h.): Improve your system

Day 7

  • Lecture (1h.): Future challenges of search
  • Exercise (2h.): Work on your search engine
  • Lunch brake
  • Exercise (1.5h.): Finalize work and prepare a presentation
  • Lecture (1.5h): Student presentations: present your system

Course leaders

Djoerd Hiemstra

Djoerd Hiemstra is associate professor in database and search engine technology at the University of Twente. Djoerd also heads Searsia, a UTwente spin-off that provides open source federated search technology.

Visit Djoerd's website for more information about his research.

Dolf Trieschnigg

Is a data scientist at Mydatafactory, a company specialised in cleansing and matching product data. He is also a guest lecturer at the University of Twente. His research interests include information retrieval, information extraction and natural language processing.

Vist Dolf's website for more information about his research.

Want to know more?

If you need to know more, we have information available about the fee & programme, an admission check, registration & payment, visa, accomodation and the terms & conditions.