|  | 
|  | 
| Module code:  PIBWI29 | 
| 2V+2PA (4 hours per week) | 
| 5 | 
| Semester: 5 | 
| Mandatory course: no | 
| Language of instruction: English
 | 
| Assessment: Written exam, duration 90 min./project work
 
 [updated 13.10.2024]
 
 | 
| DFIW-IRET (P610-0540) Computer Science and Web Engineering, Bachelor, ASPO 01.10.2019
, semester 3, mandatory course, informatics specific
 KI584 (P610-0253) Computer Science and Communication Systems, Bachelor, ASPO 01.10.2014
, semester 5, optional course, informatics specific
 KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2021
, semester 5, optional course, technical
 KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2022
, semester 5, optional course, technical
 PIBWI29 Applied Informatics, Bachelor, ASPO 01.10.2011
, semester 5, optional course, informatics specific
 PIB-IRET (P221-0080) Applied Informatics, Bachelor, ASPO 01.10.2022
, semester 5, optional course, informatics specific
 
 Suitable for exchange students (learning agreement)
 
 | 
| 60 class hours (= 45 clock hours) over a 15-week period. The total student study time is 150 hours (equivalent to 5 ECTS credits).
 There are therefore 105 hours available for class preparation and follow-up work and exam preparation.
 
 | 
| Recommended prerequisites (modules): None.
 
 | 
| Recommended as prerequisite for: 
 | 
| Module coordinator: Prof. Dr. Klaus Berberich
 | 
| Lecturer:  Prof. Dr. Klaus Berberich 
 [updated 18.03.2015]
 
 | 
| Learning outcomes: After successfully completing this course, students will have learned basic information retrieval methods. This
 includes retrieval models (e.g., Vector Space Model and Binary Independence Model), link analysis
 (e.g., PageRank), and effectiveness measures (e.g., Precision/Recall
 and MAP). They will be able to apply/implement the above methods in practice. In
 addition, students will be aware of easily accessible information
 retrieval systems (e.g., Apache Lucene/Solr).
 
 
 [updated 13.10.2024]
 
 | 
| Module content: Information Retrieval is pervasive and its applications range from
 finding contacts or e-mails on your smartphone to web-search engines
 that index billions of web pages. This course covers the most
 important information retrieval methods. We will look into how
 these methods are defined formally, including the mathematics behind
 them, but also see how they can be implemented efficiently in
 practice. As part of the project work, we will implement a small
 search engine from scratch.
 
 1. Introduction
 - History
 - Applications
 - Course overview
 
 2. Natural language
 - Documents and terms
 - Stopwords and stemming/lemmatization
 - Synonyms, polysemes, compounds
 
 3. Retrieval models
 - Boolean retrieval
 - Vector space model with TF.IDF term weighting
 - Language models
 
 4. Indexing methods
 - Inverted index
 - Compression (d-Gaps, variable-byte encoding)
 - Index pruning
 
 5. Query processing
 - Holistic methods (DAAT, TAAT)
 - Top-k methods (NRA, WAND)
 
 6. Evaluation
 - Cranfield Paradigm
 - Benchmark initiatives (TREC, CLEF, NTCIR)
 - Traditional effectiveness measures (precision, recall, MAP)
 - Non-traditional effectiveness measures (nDCG, ERR)
 
 7. Web retrieval
 - Crawling
 - Near-duplicate detection
 - Link analysis (PageRank, HITS)
 - Web spam
 
 8. Information retrieval systems
 - Indri
 - Terrier
 - Anserini
 - Apache Lucene/Solr
 - ElasticSearch
 
 
 
 [updated 13.10.2024]
 
 | 
| Recommended or required reading: Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
 
 Reginald Ferber: Information Retrieval: Suchmodelle und Data-Mining Verfahren für Textsammlungen und das Web, dpunkt, 2003.
 (available online at: http://information-retrieval.de/irb/ir.html)
 
 W. Bruce Croft, T. Strohman, D. Metzler: Search Engines Information Retrieval in Practice: Information Retrieval in Practice, Pearson, 2009
 (Available online at: https://ciir.cs.umass.edu/irbook/)
 
 Christopher D. Manning, Prabhakar Ragahavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
 (Available online at: http://nlp.stanford.edu/IR-book/)
 
 
 
 
 
 [updated 13.10.2024]
 
 | 
| Module offered in: WS 2020/21, 
WS 2019/20, 
WS 2018/19, 
SS 2018, 
SS 2017, 
...
 |