Information Retrieval
PIB-IRET
P221-0080
pi2
2
V
2
PA
5
5
no
English
Written exam/Project
DFIW-IRET
Computer Science and Web Engineering
5
mandatory course
KI584
Computer Science and Communication Systems
5
optional course
KIB-IRET
Computer Science and Communication Systems
5
optional course
KIB-IRET
Computer Science and Communication Systems
5
optional course
PIBWI29
Applied Informatics
5
optional course
PIB-IRET
Applied Informatics
5
optional course
60 class hours (= 45 clock hours) over a 15-week period.The total student study time is 150 hours (equivalent to 5 ECTS credits).There are therefore 105 hours available for class preparation and follow-up work and exam preparation.
Prof. Dr. Klaus Berberich
kbe
Prof. Dr. Klaus Berberich
kbe
After successfully completing this course, students will have learned basic information retrieval methods. This
includes retrieval models (e.g., Vector Space Model), link analysis
(e.g., PageRank), and effectiveness measures (e.g., Precision/Recall
and MAP). They will be able to apply/implement the above methods in practice. In
addition, students will be aware of easily accessible information
retrieval systems (e.g., Apache Lucene/Solr).
Information Retrieval is pervasive and its applications range from
finding contacts or e-mails on your smartphone to web-search engines
that index billions of web pages. This course covers the most
important information retrieval methods. We will look into how
these methods are defined formally, including the mathematics behind
them, but also see how they can be implemented efficiently in
practice. As part of the project work, we will implement a small
search engine from scratch.
1. Introduction
- History
- Applications
- Course overview
2. Natural language
- Documents and terms
- Stopwords and stemming/lemmatization
- Synonyms, polysemes, compounds
3. Retrieval models
- Boolean retrieval
- Vector space model with TF.IDF term weighting
- Language models
4. Indexing methods
- Inverted index
- Compression (d-Gaps, variable-byte encoding)
- Index pruning
5. Query processing
- Holistic methods (DAAT, TAAT)
- Top-k methods (NRA, WAND)
6. Evaluation
- Cranfield Paradigm
- Benchmark initiatives (TREC, CLEF, NTCIR)
- Traditional effectiveness measures (precision, recall, MAP)
- Non-traditional effectiveness measures (nDCG, ERR)
7. Web retrieval
- Crawling
- Near-duplicate detection
- Link analysis (PageRank, HITS)
- Web spam
8. Information retrieval systems
- Indri
- Apache Lucene/Solr
- ElasticSearch
Christopher D. Manning, Prabhakar Ragahavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
(available online at: http://nlp.stanford.edu/IR-book/)
Reginald Ferber: Information Retrieval: Suchmodelle und Data-Mining Verfahren für Textsammlungen und das Web, dpunkt, 2003.
(available online at: http://information-retrieval.de/irb/ir.html)
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
WS 2022/23
WS 2021/22
WS 2020/21
WS 2019/20
Fri Mar 29 16:56:36 CET 2024, CKEY=kir, BKEY=pi2, CID=[?], LANGUAGE=en, DATE=29.03.2024