iGED Miner: Discovering Process Trees using Graph Edit Distance
MSc Thesis Project
Title: iGED Miner: Discovering Process Trees using Graph Edit Distance
Student: Eike Erdmann
Supervisor: Dr. Sebastiaan J. van Zelst
1st Examiner: Prof. Dr. van der Aalst
2nd Examiner: Prof. Dr. Giesl
Modern information systems typically allow one to extract data about the the historical execution of processes. This event data being readily available spawned the field of process mining, where we are interested in gaining insights into how the process works, which can then be used in order to optimize the process. The subfield of process discovery is in turn concerned with discovering a model that describes which part of the process may execute next, given the parts that already executed. Inductive Mining Approaches are a well established way to obtain a process model from an event log. These approaches recursively identify patterns in the event data that represent specific control behavior. The most basic implementation of this framework uses the directly-follows-graph (DFG), i.e., a well-established intermediary graph-based data structure often used in process discovery algorithms. It discovers process models that describe all behavior in the input event log. Several extensions exist, e.g., one of which simply removes infrequent edges in the DFG and another one which infers likely yet missing edges in the DFG. The first one can somewhat alleviate noise if the noise is very infrequent, the second one can infer missing behaviour. Often, a mixture of noise and infrequent behavior renders both algorithms to perform poorly. In this thesis, we expand the Inductive Miner framework with the assumption that the system we are trying to model aligns well to the control structures we provide in the majority of cases. We try to make "minor" modifications to the DFG in such a way that the general structure is preserved, but such that the cuts fit "nicely" onto our control structures. Modifying the graph has an effect onto the discovered model. We generally use four different type of metrics to quantify the model quality, referred to as the four forces, which are partially competing. While many discovery algorithms return a fitness focused model for its interesting theoretic properties, we hope to give options to balance out the four forces better. The impact of our methods on the model quality is then demonstrated with various experiments on artificial event logs.