Improving State-Space Traversal of the eST-Miner by Exploiting Underlying Log Structures

Contact

Name

Lisa Mannel

Scientific Assistant

Phone

work
+49 241 80 21916

Email

E-Mail
 

MSc Thesis Project

Title: Improving State-Space Traversal of the eST-Miner by Exploiting Underlying Log Structures

Author: Yannick Epstein

1st Examiner: Prof.dr.ir. Wil M.P. van der Aalst

2nd Examiner: Prof. Dr.-Ing. Ulrik Schroeder

Daily Supervisor: Lisa Mannel, Dr. Sebastiaan J. van Zelst

Summary

In process discovery, we are given an event log and search for a model, which describes the underlying process, such that a human interpreter can gain knowledge about it. A theoretical well explored description language for process models are Petri nets. The eST-Miner is a process discovery algorithm, which finds Petri nets by starting with a net with one transition for each activity in the log and no places. Then, the maximal set of possible places, which are fitting with respect to a definable fraction of the behavior in the log, is inserted into the net. Looking at all places while evaluating the fittness is not feasible, since the number of possible places is exponential in the number of activities. The eST-Miner is often more efficient than the bruteforceapproach, by employing an efficient traversal strategy, which prunes the search space to a small number of candidate places, while still finding all fitting places. Through this exhaustive search, complex process structures can be identified, which other mining approaches fail to discover. However, even on event logs with a small number of activities, the eST-Miner is still too slow to be a feasible tool for process discovery. In this thesis, we suggest different strategies to reduce the runtime. Our optimizations are a collection of heuristic improvements, which return different nets with similar quality to the nets found by the standard eST-Miner, while enabling us to drastically decrease computation time. Through our heuristics, we are able to modify the standard eST-Miner to be a very competitive choice for process discovery on event logs with a small number of activities. This is demonstrated using various experiments on real and artificial event logs.