Bachelor Thesis: Process Mining in Python
Most organizations, in a variety of fields such as banking, insurance and healthcare, execute several different (business) processes. Modern information systems allow us to track, store and retrieve the data related to the execution of such processes, in the form of event logs. The field of process mining is concerned with the study and analysis of the data stored in such event logs. The main goal of process mining is to improve the organization’s knowledge of its own processes, by analyzing its execution, as captured in the data. In this way, the organization gets insights into the process on the basis of data describing what actually happened.
Within process mining we identify three main topics. Firstly, in process discovery, we aim to discover a process model that accurately describes the process of the organization, solely based on the data observed in an event log. The main challenge of process discovery is to discover process models that are human-interpretable, i.e. in some way simple, yet at the same time general enough in describing the data. Secondly, in conformance checking, we aim to assess to what degree a given process model and event data correspond to each other. Observe that, conformance checking allows us to identify whether the process is effectively executed as intended. Finally, in process enhancement, we aim to enhance the overall model we have of the process, i.e. by computing where bottlenecks occur, finding out what data elements determine decision points in the process, prediction of remaining process execution time, etc.
The majority of process mining algorithms, developed in academia, have a corresponding implementation in either ProM or Apromore. Both tools are java-based and comprise of a relatively complex architectural underlying framework. On the one hand, these tools provide a very modular ecosystem to develop process mining algorithms. On the other hand, the complexity of the underlying architectures often hampers the quick adoption of prototypes and the fast development of new algorithms. Furthermore, we envision the field of process mining to more-and-more incorporate algorithms from different data-science-oriented fields, e.g. deep learning, most of which are implemented in the Python programming language.
Therefore, the PADS Chair of RWTH, in cooperation with the process mining group of Fraunhofer FIT, has been building a process mining library in Python. The library contains the most fundamental algorithms used in process mining, however, a variety of interesting algorithms has not been implemented yet. We are therefore looking for several Bachelor students who are interested in developing (complex) process mining algorithms in the newly developed Python library, in the context of their B.Sc. thesis. Note that this is a challenging task, which trains you in a variety of dimensions, i.e. it requires mathematical skills to understand the algorithms, software engineering skills to properly develop and test the algorithms, writing skills in order to properly document your code. Furthermore, students are supposed to critically analyze the underlying algorithms and to speed-up and/or improve them, where possible.
Knowledge of basic computer science concepts, good programming skills (Java/Python) and an interest in theoretical and practical aspects of process mining (i.e. conformance checking) recommended.
Prof.dr.ir. Wil van der Aalst