Introduction to Data Science
Study Guide to Introduction to Data Science (WS 2021/2022)
This year both lectures and instructions will take place in presence. Please read the Study Guide provided at the end of this page for further organizational details. Check RWTHmoodle for the latest information and only use firstname.lastname@example.org to contact the lecturers
- Prof. Dr. Wil van der Aalst (Vorlesungen)
- Niklas Adams, M.Sc.
- Bianka Bakullari, M.Sc.
- Harry Beyel, M.Sc.
- Tobias Brockhoff, M.Sc.
- Tsunghao Huang, M.Sc.
- Benedikt Knopp, M.Sc.
- Viki Peeva, M.Sc.
The course aims to provide a comprehensive overview of data science and expose students to real-life data sets and tools. The course provides three angles on data science:
- Data science infrastructure concerned with volume and velocity. Topics include instrumentation, big data infrastructures and distributed systems, databases and data management, and programming, and the main challenge is to make things scalable and instant.
- Data science analysis concerned with extracting knowledge from data. Topics include statistics, data/process mining, machine learning/artificial intelligence, operations research, algorithms, and visualization, and the main challenge is to provide answers to known and unknown unknowns.
- Data science effects concerned with people, organizations, and society. Topics include ethics & privacy, IT law, human-technology interaction, operations management, business models, entrepreneurship, and the main challenge is to do all of the above in a responsible manner.
The course will dive deeper into the following topics
- Data exploration
- Data visualization
- Data quality issues and preparation
- Data types: from tables and event logs to unstructured data
- Supervised learning
- Decision tree learning
- Unsupervised learning
- Pattern mining
- Process mining
- Text mining
- Evaluation techniques
- Distribution using MapReduce
- Responsible data science: fairness, accuracy, confidentiality, and transparency
- Discrimination-aware data mining
- Anonymization versus encryption
The above will be complemented with hands-on assignments using various datasets and software tools (still to be determined).
After the course student should have a good overview of the broader data science field. Through hands-on experience with real data sets, students will better understand the challenges in the different data science subdisciplines. Moreover, a few topics will be covered in more detail, also showing more theoretical considerations.
Frequently Asked Questions
The course is not part of my study plan. How can I enrol?
I want to take the course via free enrollment, but cannot do it myself.
Please, send an email with your matriculation number to email@example.com explaining shortly the situation.