Advanced Process Mining
Lecturers (SS 2022)
- Dr.ir. Sebastiaan J. van Zelst (Lecturer)
- Tsung-Hao Huang, M.Sc. (Instructor)
- Viki Peeva, M.Sc. (instructor)
- Tobias Brockhoff, M.Sc. (Instructor)
- Prof.Dr.ir. Wil M.P. van der Aalst (Chair holder)
Course Contents and Motivation
Process mining provides a new means to improve processes in a variety of application domains. There are two main drivers for this new technology. On the one hand, more and more events are being recorded, thus providing detailed information about the history of processes. On the other hand, in most organizations there is a need to improve process performance (e.g., to reduce costs and flow time) and compliance (e.g., to avoid deviations or risks). Process mining bridges the gap between model-based process analyses (e.g., simulation, model checking, and classical BPM techniques) and data-oriented techniques (e.g., data mining techniques like classification, clustering, and regression). Process mining techniques can be applied in a variety of domains, i.e. ranging from banking to healthcare. Some examples:
- Discovering the root causes for delays in treatment processes in a hospital: What groups of patients are not treated according to the guidelines?
- Diagnosing the behaviour of an X-ray machine that malfunctions and suggesting preventative maintenance: What component should be replaced?
- Analysing the "customer journey" of customers that have purchased a product and are using related services: How to seduce customers to purchase more services and additional products?
- Checking the conformance of processes in local governments to find potential cases of fraud: Why was the formal approval step bypassed frequently?
- Analysing the study behaviour of students following a Massive Open Online Course (MOOC): What are the differences in study behaviour between students that pass and students that fail the course?
- Analysing a baggage handling system in an airport to understand where luggage gets delayed or misplaced: When and why is the baggage handling system not meeting the service level agreements?
- Discovering the actual processes supported by a service desk of a large bank: Why does it take such a long time before a person is found that can assist in solving the problem?
This advanced course on process mining, consists of two main tracks.
- Track 1: Advanced process mining techniques and the theoretical foundations of process mining (based on a variety of selected papers). Track 1 is assessed by means of a final written test (60%). The track focuses on several advanced process mining topics, including: inductive mining, language-based regions to discover process models, creating alignments to relate observed and modelled behaviour, decomposing large event logs to enable process mining in the context of big data, etc.
- Track 2: Practical hands-on experience with process mining with a particular focus on analysis workflows, scientific process mining experiments, and real-world process mining. This track exposes students to real-life data sets to understand challenges related to process discovery, conformance checking, and process model enhancement. Track 2 is examined by means of an assignment that consists of three parts (40%).
After taking this course students should:
- have a detailed understanding of the entire process mining spectrum and be able to relate process mining techniques to other analysis techniques (data mining, model checking, simulation, machine learning, etc.),
- understand the positioning of process mining in the context of data science, process management and "big data",
- be able to apply a range of process mining techniques and use tools such as RapidMiner, ProM, PM4Py, Disco and Celonis.
- be able to design analysis workflows and execute them on concrete practical datasets (e.g., using RapidMiner, PM4Py and/or ProM),
- be able to conduct experiments to investigate the influence of noise (infrequent/deviating behaviour) on the process mining results,
- be able to read formal descriptions of process mining techniques and reason about their properties,
- understand the intricate relation between observed behaviour (e.g., events logs in XES or MXML format) and modelled behaviour (e.g., Petri nets with an initial and final marking),
- understand and apply advanced process discovery techniques using language-based regions (ILP miner),
- be able to discuss all four conformance dimensions (replay fitness, precision, generalization, and simplicity), provide metrics for these dimensions, and apply conformance checking using models and logs,
- be able to reason about the strengths and weaknesses of existing process mining algorithms and critically evaluate new ones,
- understand and create alignments as a tool for conformance checking and other types of analysis that require the mapping of observed behaviour onto modelled behaviour,
- understand the limitations of process techniques in terms of computation time, memory use, and data requirements,
- be able to decompose large process mining problems (discovery and conformance checking) into smaller ones (using valid decompositions),
- understand the relation between the results of decomposed process mining and non-decomposed process mining (e.g., what properties are preserved and what guarantees can be given), and
- be able to conduct real-world process mining projects using real data and imprecise questions from stakeholders.
The following material will be distributed via RWTH Moodle:
- Reading material
- Exercises (+ Solutions)
- Example Data / process models
- Practice Exams
In the detailed planning, for each topic of the course, a selection of background reading is presented. In general, the textbook "W.M.P. van der Aalst. Process Mining: Data Science in Action. Springer-Verlag, Berlin, 2016" is advised as background information. It can be ordered via Springer, Amazon, or Bol. Another useful book is the textbook "Fundamentals of Business Process Management" by Dumas et al. Note that, you are encouraged to read the course’ papers before the respective lecture. This way you will get the most from the lecture and save time. Without at least attempting to read the paper, you will not see the challenging nature of the material until it is too late!
If you have any questions please use the lectures, instructions, and question hours (ad-hoc e-mails will not be answered). If you would like to be involved more in RWTH’s process mining/data science research, i.e. as performed by the PADS chair, use the breaks to approach the responsible lecturer, dr.ir. Sebastiaan J. van Zelst.