Event Label Refinement by Applying Community Detection to Event Similarity Graphs



Sebastiaan J. van Zelst

Scientific Assistant - Fraunhofer FIT


+49 241 80 21926



Student: Jonas Tai

Title: Event Label Refinement by Applying Community Detection to Event Similarity Graphs

Supervisor: Dr. Sebastiaan J. van Zelst

1st Examiner: Prof. Wil M.P. van der Aalst

2nd Examiner: Prof. Jürgen Giesl


Information systems of companies collect detailed information about every step of their processes. Process mining uses this information to discover process models of the underlying business processes to improve the overall understanding of the processes and to gain competitive advantages. The process discovery algorithms developed to tackle this challenge use activity names, recorded in the event data, to identify steps in processes and commonly assume a one-to-one relationship between activity names and the activities modeled in a process model. In practice, activity names are often imprecise and the same activity name can appear in different contexts in a process. Thus, assuming a one-to-one relationship in the process discovery step leads to inaccurate and overly complex process models. Therefore, in this thesis, we propose a label refinement pre-processing step that refines imprecise activity names in event logs based on the control-flow context of events. We define a method for modeling events into a weighted graph and for measuring event similarity based on preceding and succeeding events. While other techniques compare complete process executions, we only compare the local context of events, enabling us to model the relationship between events from the same process execution. We evaluate our approach on a set of artificial event logs with imprecise labels and on a real life event log. Our results show that our approach finds meaningful label refinements that improve the quality of the resulting process model. We also show that our approach is able to consistently outperform other state-of-the-art label refinement algorithms.