Improving State-Space Traversal of the eST-Miner by Exploiting Underlying Log Structures
MSc Thesis Project
Title: Improving State-Space Traversal of the eST-Miner by Exploiting Underlying Log Structures
Author: Yannick Epstein
1st Examiner: Prof.dr.ir. Wil M.P. van der Aalst
2nd Examiner: Prof. Dr.-Ing. Ulrik Schroeder
Daily Supervisor: Lisa Mannel, Dr. Sebastiaan J. van Zelst
Summary
In process discovery, we are given an event log and search for a model, which describes the underlying process, such that a human interpreter can gain knowledge about it. A theoretical well explored description language for process models are Petri nets. The eST-Miner is a process discovery algorithm, which finds Petri nets by starting with a net with one transition for each activity in the log and no places. Then, the maximal set of possible places, which are fitting with respect to a definable fraction of the behavior in the log, is inserted into the net. Looking at all places while evaluating the fittness is not feasible, since the number of possible places is exponential in the number of activities. The eST-Miner is often more efficient than the bruteforceapproach, by employing an efficient traversal strategy, which prunes the search space to a small number of candidate places, while still finding all fitting places. Through this exhaustive search, complex process structures can be identified, which other mining approaches fail to discover. However, even on event logs with a small number of activities, the eST-Miner is still too slow to be a feasible tool for process discovery. In this thesis, we suggest different strategies to reduce the runtime. Our optimizations are a collection of heuristic improvements, which return different nets with similar quality to the nets found by the standard eST-Miner, while enabling us to drastically decrease computation time. Through our heuristics, we are able to modify the standard eST-Miner to be a very competitive choice for process discovery on event logs with a small number of activities. This is demonstrated using various experiments on real and artificial event logs.