Kursusnavn (dansk):  Data Mining 
Kursusnavn (engelsk):  Data Mining 
Semester:  Forår 2019 
Udbydes under:  cand.it., softwareudvikling og teknologi (sdt) 
Omfang i ECTS:  7,50 
Kursussprog:  Engelsk 
Kursushjemmeside:  https://learnit.itu.dk 
Min. antal deltagere:  1 
Forventet antal deltagere:  0 
Maks. antal deltagere:  15 
Formelle forudsætninger:  Students must have experience with and be comfortable with programming, and be capable of independently implementing algorithms from descriptions in pseudocode. This corresponds to at least having passed an introductory programming course, and preferably also an intermediatelevel programming course. The course will contain compulsory assignments that will include programming.
This course, Data Mining, is only for students, who followed the course in spring 2018 but did not pass. 
Læringsmål:  After the course the students should be able to:
 Analyse data mining problems and reason about the most appropriate methods to apply to a given dataset and knowledge extraction need.
 Implement basic preprocessing, association mining, classification and clustering algorithms.
 Apply and reflect on advanced preprocessing, association mining, classification and clustering algorithms.
 Compare and evaluate the application of different algorithms on realworld problems.
 Analyse the characteristics of a dataset and identify the best storage and processing technologies
 Analyse the scale and complexity of a data processing and analysis problem 
Fagligt indhold:  This course gives an introduction to the field of data science with a focus on the game industry. The course is relatively practically oriented, focusing on applicable algorithms and technologies. Practical exercises will involve both use of a freely available data mining package and individual implementation of algorithms.
The course will cover the following main topics:
 Data science overview
 Data preprocessing and cleaning
 Pattern and association mining
 Classification and prediction
 Cluster analysis
 Bigdata storage and processing
Additionally the course will touch on topics including recommender systems, time series analysis and deep learning.
Application examples will be given primarily within the context of computer games, bu other areas will be mentioned, such as ecommerce, computer vision and finance. 
Læringsaktiviteter:  The course consists of 7 weeks of lectures, followed by 7 weeks of supervised group projects. Most lectures are followed by a lab exercise, which involves independent programming. Students must be able to program. The default language is Python, but other languages are possible.
A large part of the course will be taken up by the group project, in which you can choose to work on a relevant Data Science project of your choice. In this project you will apply the techniques and algorithms studied during the course on relevant real world problems. This will be done in groups of 3 persons. Besides the hours planned for lectures, tutorial, and exercise, supervision sessions for the group projects are planned which complement the theory covered during the lectures and are necessary for meeting the learning objectives of the course. You will also practice presenting your work during the course in order to prepare for the oral exam. Lectures provide theoretical foundations and walkthrough examples of relevant data mining algorithms while exercises focus on students discussing and implementing the central algorithms themselves. 
Obligatoriske aktivititer:  There will be one mandatory assignment, consisting of using selfimplemented data mining techniques on a simple data set and writing a report about it.
Be aware: The student will receive the grade NA (not approved) at the ordinary exam, if the mandatory activities are not approved and the student will use an exam attempt. 
Eksamensform og beskrivelse:  D2G Aflevering med mundtlig eksamen der supplerer projekt. Delt ansvar for projekt., (7scale, external exam) The students will have to work on a group project and will have to handin a 10 pages report.
The project will be based on the course content and will require the students to work on an existing dataset to produce some advanced analysis.
Students present their group project as a group (5 minutes per member) and then have 15 minutes for questions and evaluation.
Form of group exam: Mixed 2.

 