Bachelor Thesis - Incremental Conformance Checking
Process Mining is a research discipline that is positioned at the intersection of data-driven methods like machine learning and data mining on the one hand and Business Process Modeling (BPM) on the other hand. It aims to discover, monitor, and enhance processes by extracting knowledge from event data that can be extracted from almost all modern databases.
In process mining, event data, originating from the execution of a (business) process, stored in the underlying information systems of a company is often used as a basis. One of the sub-domains of process mining is conformance checking, in which one aims to detect inconstancies between the recorded event data and a corresponding reference process model. Using the proposed algorithms in this field, we help business owners to detect deviations and fraud in their business.
Early conformance checking methods, e.g., token-based replay, usually lead to ambiguous and/or unexpected outcomes. Therefore, alignments were developed with the concrete goal to describe and quantify deviations in a non-ambiguous form. The alignment technique has rapidly turned into the de facto standard conformance checking technique. Moreover, alignments serve as a basis for other process mining methods that link event data to process models, e.g., they support performance analysis, decision mining, business process model repair, and prediction techniques. However, alignment computation is time-consuming for real large event data, which makes it inapplicable in reality.
In many applications, it is required to compute alignment values several times. For example, if we aim to discover an appropriate process model from event data, it is required to discover several process models using various process discovery algorithms with different settings, and, measure how each process model fits with the event data by applying alignment techniques. As normal alignment methods take a considerable time for large real event data, analyzing many candidate process models is impractical. Therefore, by decreasing the alignment computation time, we can consider more candidate process models in a limited time. Furthermore, in many cases, we do not need to have accurate alignment values, i.e., it is sufficient to have a quick approximated value or a close lower/upper bound for it. Thus, by having that approximated conformance value, we are able to find a suitable process model faster. By providing bounds, we guarantee that the accurate alignment value could not exceed a range of value, and, consequently we can determine if it is required to do further analysis or not, which saves lots of time. Moreover, sometimes it is valuable to have a quick approximated conformance value and it is excellent worth to let users adjust the level of approximation.
In this Master project, we plan to investigate techniques that allow us to obtain the conformance checking approximation and its bounds in a fast way. The accuracy of the approximated value and bounds will be improved incrementally until the user is satisfied with the resulting accuracy. The provided techniques should let the end-user put different constraints, e.g., the computation time or the distance of bounds (i.e., bound width), and compute the approximation considering them.
Good programming skills and knowledge of process mining specifically conformance checking. It is expected that applicants are familiar with working ProM know how to develop in Java.
prof.dr.ir. Wil van der Aalst
Mohammadreza Fani Sani (primary advisor) and Sebastiaan van Zelst (secondary advisor)
For more information
Send an e-mail to Mohammadreza Fani Sani. Make sure to include a C.V., detailed information about your background, and scores for completed courses.