Master Thesis - Discovering More Precise Process Models using Long-Term Dependencies
More and more processes executed in companies are supported by information systems, which store each event executed in a so-called event log. In the context of process mining, many algorithms and software tools have been developed to utilize the data contained in such event logs: By analyzing the execution of a process, as captured in the log, we get insights on deviations of the ideal process model, bottlenecks and waste of resources.
The field of process discovery focusses on extracting a model from a given event log, such that it reflects the process underlying the log: The observed events are put into relation to each other, preconditions, choices, concurrency, etc. are discovered, and brought together in a process model, e.g. a Petri net. Process discovery is a non-trivial algorithmic endeavor for a variety of reasons. Ideally, a discovered model should be able to produce the behavior contained within the event log (fitness), not allow for behavior that was not observed (precision), represent all relevant dependencies between the events and at the same time be simple enough to be understood by a human interpreter. It is rarely possible to fulfill all these requirements simultaneously. Based on the capabilities and focus of the used algorithm, the discovered models can vary greatly.
Most existing algorithms are unable to discover complex control-flow structures, in particular long-term dependencies, and therefore often lack precision. At the same time, the discovered models often achieve good results with respect to other metrics. In this MSc. thesis, the student is requested to develop a post-processing algorithm that improves the precision of a given model by inserting long-term dependencies without violating other desirable properties. This includes the implementation of the algorithm in Python and validation by extensive experimentation, as well as formal reasoning and rigurous proof of the theoretical foundations and guarantees. The student presents the achievements in the context of their thesis paper as well as an intermediate and final presentation.
Knowledge of basic computer science concepts, good theoretical foundations, programming skills (Python) and an interest in theoretical and practical aspects of process mining (i.e. process discovery).
- Prof.Dr.ir. Wil van der Aalst
- Lisa Mannel MSc (primary daily supervisor)
- Dr.ir. Sebastiaan J. van Zelst (secondary daily supervisor)