Master Thesis - An Integrated Framework for Detecting Underlying Equations in Performance Metrics of Processes
Data science is a magical world of finding hidden values in the huge amount of data. Using data science techniques we are able to increase our understanding of the systems. Moreover, process mining acts as a bridge between data and process science and brings the two worlds of process and data together. Process mining techniques provide insight into the organization's processes from different aspects including performance metrics. One of the missing gaps is to detect the relationship between different performance metrics of a process, e.g., how the changes in the number of arrival customers would influence the speed of the resources in performing their tasks. The science of data is able to pave the way for us in process mining to find out about the future. The first thing is to know the current state of the process and all the underlying relations between the extracted data from the process. Using these relations, we are able to form the models and equations resulting in the future analyses of the processes.
Despite all the existing machine learning techniques, an integrated framework is missing which should be able to discover underlying equations including linear and nonlinear equations between the performance metrics. In this thesis, the aim is designing and implementing an integrated framework in which the performance metrics of processes can be discovered, e.g., arrival rate, the average service time, rejection rate, and many others. After this step, identifying the exact relation between these performances metric will provide the opportunity to create the different types of models including machine learning and simulation models to predict the changes and future behavior of a process.
Assume we know that in our processes, higher arrival rate would result in performing tasks faster by our resources up to a limit. That is a real challenge for businesses to know if this relation exists, what is the mathematical equation between these two metrics and then use the created model for predicting the changes and optimizing their processes. The general framework including some sample techniques is represented in the attachment. As figure 1 shows, the model can be trained based on regression models or fitting the curves to find the closest curve to the nonlinear relations. In this thesis, we are going to go through different techniques and assess them based on the process mining context. The ultimate goal is to have an integrated interactive framework that gets our data, i.e., processes performance metrics as an input and returns the closest equation which represents the underlying relations in the data.
Good programming skills (python) and knowledge of basic computer science concepts. An interest in data science including machine learning techniques and process mining. Data science knowledge specifically regarding dealing with data such as regression, and curve fitting technique is required.
- Prof.Dr.ir. Wil van der Aalst
- Mahsa Pourbafrani MSc (primary daily supervisor)
- Dr.ir. Sebastiaan J. van Zelst (secondary daily supervisor)
Send an e-mail to Mahsa Bafrani. Make sure to include detailed information about your background and scores for completed courses.