Refining Event Labels using Community Detection, Semi-Greedy Mapping Search and Case Folding

Kontakt

Name

Sebastiaan J. van Zelst

Wissenschaftlicher Mitarbeiter - Fraunhofer FIT

Telefon

work
+49 241 80 21926

E-Mail

E-Mail
 

BSc Thesis Project

Title: Refining Event Labels using Community Detection, Semi-Greedy Mapping Search and Case Folding

Author: Moritz Langenberg

Supervisor: Dr. Sebastiaan J. van Zelst

1st Examiner: Prof. Dr. Wil M.P. van der Aalst

2nd Examiner: Prof. Dr. Joost-Pieter Katoen

Summary

In today’s information systems, a large amount of event data is automatically recorded from the execution of processes. Process mining is the discipline that is concerned with analyzing this event data to gain insight in the processes and to improve the processes. In many scenarios, an activity can occur in multiple different parts of a process. For example, in a patient treatment process, the activity ’CT-Scan’ can be performed at the beginning and the end of the treatment process. A human modeler most likely models such a process with two nodes carrying the same activity label. In the recorded event log of such processes, the events are only labeled with their activity name. In Process discovery, the goal is to automatically find a model from an event log that represents the process behavior.Most existing discovery techniques model each activity label seen in the event log with one node in the model and are not able to observe or discover such differentiation. More generally, the models discovered from such event logs tend to be overgeneralizing and too complex. In this thesis, we adopt an existing preprocessing approach to refine event labels based on the behavioral similarity of the events in their cases. We extend the approach with 1) understanding the label refinement problem as a community detection problem, 2) folding cases, 3) using a semi-greedy mapping search. The extended approach is evaluated on artificial event logs generated from process models containing duplicated activities. On average, the precision of the models discovered from the refined event logs is 0.18 higher than the precision of the models discovered from the not refined event logs.