GPU-enabled Process Mining: a general overview
BSc Thesis Project
Student name: Minh Nghia Phan (BSc)
Thesis title: GPU-enabled Process Mining: a general overview
Advisors: Alessandro Berti, M.Sc.; Dr.Ir. Merih Seran Uysal
Supervisors: Prof. Dr. Wil van der Aalst, Prof. Dr. Ulrik Schroeder
Nowadays, there are numerous process mining tools ranging from academic (ProM, PM4Py, etc.) to commercial (Celonis, LANA, etc.). Though covering a wide variety of process mining algorithms, they have little support for newer event log storage formats such as Columnar. Moreover, there are currently little to no process mining software designed for computation on Graphics Processing Units (GPU), which would have significantly reduced computation time. Motivated by these problems, the Chair of Process and Data Science at RWTH Aachen University has developed a prototypal Python library for process mining on GPU called PM4PyGPU, which is built up on cuDF, a GPU dataframe library. In this thesis, we extend the functionalities of PM4PyGPU to more complex process mining algorithms. In doing so, we first propose a definitions set for working with dataframes that is close to their actual implementations in cuDF. We then use the introduced definitions to define some possible process mining algorithms on dataframes, which are in turn implemented in PM4PyGPU. The implemented algorithms, along with already existing functionalities of PM4PyGPU, are evaluated against their PM4Py counterparts on different real-life event logs with varying sizes. The results indicate that while PM4PyGPU might perform slower on smaller event logs due to some overhead of cuDF, it achieves notable speedup on bigger event logs in comparison to PM4Py.