Master Thesis - Identifying Subsets of Multiple Viewpoint Event Logs with Behavior Clustering



Anahita Farhang Ghahfarokhi

Scientific Assistant


+49 241 80 21909




Process mining can be seen as a missing link between model-based analysis and data-oriented analysis techniques. Process mining starts from recorded events that are each characterized by a case identifier, an activity name, a timestamp and other additional attributes. Case notion is used for correlating events. Process mining techniques help us to analyze event logs consisted of these recorded events. In traditional event logs, each event refers to a particular case. In existing information systems such as the ERP systems of SAP, events are correlated with multiple case notions. For instance, in purchasing process of a company, a same action may be linked to customers, orders, items, products. We call this kind of event logs as Multiple Viewpoint (MVP) event logs. Having these kind of event logs brings us new opportunities to apply machine-learning techniques like clustering techniques in a wider scale in comparison with traditional event logs.

One important point for production planning in a company is identifying process models of critical products, customers, etc. By using clustering techniques we can find subsets of MVP event logs with similar behaviour and applying process discovery techniques let us find models of the acquired subsets. We can do clustering based on different data and process oriented metrics, e.g., performance or frequency of the objects. By applying clustering methods on MVP event logs, we will give the company the opportunity to find the process model of MVP event logs subsets with more similar behavior. For example, we can find models of subsets consisted of critical products (e.g., defined as products with high frequency and high performance).

The outputs should consist:

  • A software with an interactive graphical interface that lets users ingest multi-dimensional event logs, identify and show the process models of clusters.
  • A thesis containing a description of the approach, the design choices followed in the approach and in the implementation and an evaluation of the quality of the results in comparison with other approaches identified in the literature.


  • Basic knowledge of process mining (Introduction to Data Science or Business Process Intelligence courses).
  • Basic knowledge of Python language(e.g., Pandas Python framework)

Supervisor Wil van der Aalst


Anahita Farhang Ghahfarokhi

For more Information

Send an e-mail to . Make sure to include detailed information about your background and scores for completed courses.