Introduction to Data Science
Study Guide to Introduction to Data Science (WS 2021/2022)
Due to the Coronavirus the course will be held in semi-hybrid setting. Lectures will be pre-recorded and uploaded on RWTHmoodle. Instructions will be online, but we will offer face-to-face sessions on a weekly basis. Check RWTHmoodle for the latest information and only use email@example.com to contact the lecturers
- Prof. Dr. Wil van der Aalst (Vorlesungen)
- Gyunam Park M.Sc.
- Mahnaz Qafari M.Sc.
- Tobias Brockhoff M.Sc.
- Niklas Adams M.Sc.
- Ali Norouzifar, M.Sc.
- Bianka Bakullari M.Sc.
The course aims to provide a comprehensive overview of data science and expose students to real-life data sets and tools. The course provides three angles on data science:
- Data science infrastructure concerned with volume and velocity. Topics include instrumentation, big data infrastructures and distributed systems, databases and data management, and programming, and the main challenge is to make things scalable and instant.
- Data science analysis concerned with extracting knowledge from data. Topics include statistics, data/process mining, machine learning/artificial intelligence, operations research, algorithms, and visualization, and the main challenge is to provide answers to known and unknown unknowns.
- Data science effects concerned with people, organizations, and society. Topics include ethics & privacy, IT law, human-technology interaction, operations management, business models, entrepreneurship, and the main challenge is to do all of the above in a responsible manner.
The course will dive deeper into the following topics
- Data exploration
- Data visualization
- Data quality issues and preparation
- Data types: from tables and event logs to unstructured data
- Supervised learning
- Decision tree learning
- Unsupervised learning
- Pattern mining
- Process mining
- Text mining
- Evaluation techniques
- Distribution using MapReduce
- Responsible data science: fairness, accuracy, confidentiality, and transparency
- Discrimination-aware data mining
- Anonymization versus encryption
The above will be complemented with hands-on assignments using various datasets and software tools (still to be determined).
After the course student should have a good overview of the broader data science field. Through hands-on experience with real data sets, students will better understand the challenges in the different data science subdisciplines. Moreover, a few topics will be covered in more detail, also showing more theoretical considerations.
Frequently Asked Questions
The course is not part of my study plan. How can I enrol?
I want to take the course via free enrollment, but cannot do it myself.
Please, send an email with your matriculation number to firstname.lastname@example.org explaining shortly the situation.