Integrating resource and workload awarenessin the PM4Py Distributed Engine


BSc Thesis

Title: Integrating resource and workload awareness in the PM4Py Distributed Engine

Author: David Enzo Schneider

Supervisors: Prof. Dr. Ir. Wil van der Aalst, Prof. Dr. Ulrich Schroeder

Advisors: Alessandro Berti, Dr. Merih Seran Uysal

Submission date: 05/09/2020


Process Mining is one of the latest and economically important field of data science that
enables us to better understand processes in many areas beyond computer science. It is
therefore necessary to give access to the tools used for process mining as many people
as possible. Therefore the Process and Data Science (PADS) Chair of RWTH Aachen
University started to develop a Python library called PM4Py that is accessible to anyone.
To expand this idea it is also necessary to test the boundaries, to not only run the library
on single computers but to make this project scalable for server environments to compute
analyse processes. In this paper we expanded the working on project by the PADS Chair
called PM4Py-distr, which is a side project of PM4Py that is made to run on distributed
systems. We examined how a specific algorithm in Process Discovery called Inductive
Miner directly-follows based can be modified to a distributed network with the use of
resource and workload awareness. This is the first algorithm after the DFG calculator,
implemented in this distributed engine and therefore the results showed that with little
optimisation, the overhead for distributed file sharing outweighs the utilisation of resources
in a distributed network. Even with negative efficiency it is notable though that utilising
the CPU and network resources gives faster results than favoring other resources.



Alessandro Berti

Software Engineer


+49 241 80 21949



External Links