htw saar Piktogramm QR-encoded URL
Back to Main Page Choose Module Version:
emphasize objectives XML-Code


Data Engineering

Module name (EN):
Name of module in study programme. It should be precise and clear.
Data Engineering
Degree programme:
Study Programme with validity of corresponding study regulations containing this module.
Applied Informatics, Master, ASPO 01.10.2017
Module code: PIM-DE
The exam administration creates a SAP-Submodule-No for every exam type in every module. The SAP-Submodule-No is equal for the same module in different study programs.
Hours per semester week / Teaching method:
The count of hours per week is a combination of lecture (V for German Vorlesung), exercise (U for Übung), practice (P) oder project (PA). For example a course of the form 2V+2U has 2 hours of lecture and 2 hours of exercise per week.
3V+1U (4 hours per week)
ECTS credits:
European Credit Transfer System. Points for successful completion of a course. Each ECTS point represents a workload of 30 hours.
Semester: 2
Mandatory course: yes
Language of instruction:
Written exam

[updated 20.12.2017]
Applicability / Curricular relevance:
All study programs (with year of the version of study regulations) containing the course.

DFI-DE (P610-0286) Computer Science, Master, ASPO 01.10.2018 , semester 2, mandatory course
KIM-DE (P222-0050) Computer Science and Communication Systems, Master, ASPO 01.10.2017 , semester 2, mandatory course
PIM-DE (P222-0050) Applied Informatics, Master, ASPO 01.10.2017 , semester 2, mandatory course
Workload of student for successfully completing the course. Each ECTS credit represents 30 working hours. These are the combined effort of face-to-face time, post-processing the subject of the lecture, exercises and preparation for the exam.

The total workload is distributed on the semester (01.04.-30.09. during the summer term, 01.10.-31.03. during the winter term).
60 class hours (= 45 clock hours) over a 15-week period.
The total student study time is 180 hours (equivalent to 6 ECTS credits).
There are therefore 135 hours available for class preparation and follow-up work and exam preparation.
Recommended prerequisites (modules):
Recommended as prerequisite for:
Module coordinator:
Prof. Dr. Klaus Berberich
Prof. Dr. Klaus Berberich

[updated 27.10.2016]
Learning outcomes:
After successfully completing this module, students will be capable of handling large amounts of structured and unstructured data. They will know the basic structures of a (relational) database system and be familiar with implementation techniques (e. g. index structures and blocking mechanisms), as well as their benefits (e. g. query acceleration and transaction isolation). Students will be able to differentiate between transaction-oriented (OLTP) and analytical (OLAP) application scenarios. They will know the basic terms of so-called data warehouses and can express analytical information requirements in a suitable query language (e. g. SQL and MDX). Students will be familiar with basic information retrieval models (e. g. vector space model) and can apply them to sample data, in order to master unstructured data (e. g. text documents). They will be familiar with quality criteria (e. g. precision and yield) and can calculate them for the determined results. Students will be familiar with data mining methods, such as the analysis of shopping carts, as a means of gaining knowledge from data.  Students will be capable of systematically determining the parameters of such procedures and critically assessing the results. Students will be familiar with the platforms available for distributed data processing, (e. g. MapReduce and Spark). They will be able to select a suitable platform for a given analytical task and implement the task using this platform.

[updated 24.02.2018]
Module content:
1. Introduction
2. Database systems
2.1 Architecture
2.2 Buffer management
2.3 Access structures
2.4 Query processing
2.5 Transaction management
3. Data warehouses
3.1 Modeling
3.2 Data integration
3.3 Query languages
3.4 Implementation aspects
4. Information retrieval
4.1 Retrieval models
4.2 Quality criteria and evaluation
4.3 Implementation aspects
5. Data mining
5.1 Classification
5.2 Cluster analysis
5.3 Association rule learning
6. Big data
6.1 Platforms (e.g. MapReduce and Spark)
6.2 Interfaces (e.g. Pig and Hive)
6.3 Implementation of selected procedures (e.g. k-Means and PageRank)

[updated 20.12.2017]
Teaching methods/Media:
Transparencies, practical and theoretical exercises

[updated 20.12.2017]
Recommended or required reading:
Kemper Alfons und Eickler André: Datenbanksysteme - Eine Einführung, De Gruyter, 2015
Saake Gunter und Sattler Kai-Uwe: Databases: Implementierungstechniken, mitp Professional, 2011
Garcia-Molina Hector, Widom Jennifer, Ulmman Jeffrey D.: Database Systems: The Complete Book, Pearson Education, 2013
Leskovec Jure, Rajaraman Anand und Ullman Jeffrey D.: Mining of Massive Datasets, Cambridge University Press, 2014

[updated 24.02.2018]
Module offered in:
SS 2024, SS 2023, SS 2022, SS 2021, SS 2020, ...
[Sat Jun 15 12:44:59 CEST 2024, CKEY=kde, BKEY=pim2, CID=PIM-DE, LANGUAGE=en, DATE=15.06.2024]