Moduldatenbank - Modul Data Mining

Data Mining

Modulbezeichnung (engl.): Data Mining

Code: KI861

4V (4 Semesterwochenstunden)

Studiensemester: 2

Pflichtfach: nein

Arbeitssprache:
Englisch

Prüfungsart:
Research report on scientific background and implementation for project 50%

[letzte Änderung 26.01.2010]

KI861 Kommunikationsinformatik, Master, Ordnung 01.04.2016 , 2. Semester, Wahlpflichtfach, informatikspezifisch
PIM-WI59 Praktische Informatik, Master, Ordnung 01.10.2011 , 2. Semester, Wahlpflichtfach, informatikspezifisch

Die Präsenzzeit dieses Moduls umfasst bei 15 Semesterwochen 60 Veranstaltungsstunden (= 45 Zeitstunden). Der Gesamtumfang des Moduls beträgt bei 5 Creditpoints 150 Stunden (30 Std/ECTS). Daher stehen für die Vor- und Nachbereitung der Veranstaltung zusammen mit der Prüfungsvorbereitung 105 Stunden zur Verfügung.

Empfohlene Voraussetzungen (Module):
KI735 Höhere Mathematik 1

[letzte Änderung 26.01.2010]

Als Vorkenntnis empfohlen für Module:

Modulverantwortung:
Prof. Dr. Damian Weber

Dozent/innen:
Prof. Dave Swayne

[letzte Änderung 26.01.2010]

Lernziele:
The purpose of the course is to develop a facility of learning and identifying, from scientific and economic model outputs, the structure of the unknown functional relationship between tuneable parameters and measured outputs in these (typically) large models. These models are remarkably complex – they come equipped with whole communities of contributors who use them in practice and expand their functionality (and complexity).
Students who wish to practice the art of applied modelling have to develop an understanding of what is in the models, in order to contribute to their improvements. At the same time, we have to examine classical and current data mining approaches to identify rules for the behaviour of scientific models. These models often have very complex time-evolution (The model on which I am currently working takes one hour per run on a 1 GH computer, one important one takes 6 hours per run). The calibration of these models is an unsolved problem of considerable complexity. Single run-times are measured in minutes to hours, and very little is known about the structure of the parameter space in which the models operate, or even about the existence of a “solution” point in the parameter space. Many papers are still being published concerning the appropriate statistical measures of a model’s success, and little is known about the success of models to predict the “future” evolution of the system under study. That is,, when fundamental changes in the background conditions under which a suitable parameter set has been developed, it is unknown whether the parameters remain valid in all cases.

[letzte Änderung 26.01.2010]

Inhalt:
1.Model structures and characterization
    Physical basis
    Model driver
    Model components
    Core components
    Add-ins
    Computational Basis
    Time and spatial scales
    Time evolving or time-averaged
    Characterization of extension to base application
    Incorporation of uncertainty, sensitivity
    Metrics for comparison: objective functions, use of corroborating data.
    Parametrization
    Knowledge representation
2.Statistical issues:
    autocorrelation, dependencies, orthogonalization, generalizations of classical statistical measures
   Association rules (R-Project)
   Near-neighbour matching
3.Dynamic Programming approaches
   Clustering (K-means, variable ratio, spanning trees, rough sets, hierarchical clustering methods)
4.Decision trees (incl use of C4.5)
   Rule extraction and verification
5.Elements and applications of computational learning theory
   Knowledge input to a GA exploration (Shuffled Complex Evolution, Dynamic Dimensioned Search)
   Reverse engineering of models

[letzte Änderung 26.01.2010]

Weitere Lehrmethoden und Medien:
Twice daily meetings will occur, for a total of 9-2.5 hour meetings.
Students will be required to develop a hypothetical work plan which includes:
a formulation of a search strategy and parametrization
understanding the generation and parametrization of test data sets
experimental work to determine the characteristics of the parameter space.

[letzte Änderung 26.01.2010]

Literatur:
1. Artificial Intelligence, a Modern Approach (2nd Ed.) Russell and Norvig. 2000. Prentice-Hall. (main text).

Journals. Note: papers from the following journals are archived in the CRLE lab, obtained from the library, and being accumulated for this research)
2. Various AAAI, IFIP, Springer etc. Monographs
3. Journal of Optimization Theory and Applications
4. Mathematical Methods of Operations Research
5. Journal of Statistical Software
6. Machine Learning Journal
7. Model source / executable codes, user and technical documentation (as developed for case studies)

[letzte Änderung 26.01.2010]

Modul angeboten in Semester:
WS 2011/12, WS 2010/11, WS 2009/10

[Sat Aug 1 06:30:59 CEST 2026, CKEY=kdm, BKEY=kim, CID=KI861, LANGUAGE=de, DATE=01.08.2026]