Attribute-Driven Hierarchical Clustering of Event Data in Process Mining



Sebastiaan J. van Zelst

Scientific Assistant



MSc Thesis Project

Title: Attribute-Driven Hierarchical Clustering of Event Data in Process Mining

Author: Yukun Cao

1st Examiner: Wil M.P. van der Aalst

2nd Examiner: Sebastiaan J. van Zelst


Business processes are usually executed for different process execution classes, e.g., customer types (silver, gold, platinum) in financial service and these execution class information is stored in event data. Commonly, processes executed for different execution classes are typically different. Capturing and comparing the similarity among these typical process executions under user-selected execution classes is important to understand the complete process. However, in many cases, the number of execution classes is too large to compare them manually. Moreover, existing trace clustering techniques do not take the execution class information into account and are merely trace-based. Therefore, in this thesis, we propose an attribute-driven hierarchical clustering framework that allows us to compare the behavioral difference among sets of cases on the basis of user-specified case attributes. Furthermore, we evaluate several different behavior-driven similarity measures in our framework with both synthetic data and real-life data sets. The obtained results show that by clustering, the models discovered from the clustered group of cases are of better quality than the model discovered on the basis of the complete log.