Bachelor Thesis - Implementation of automated feature selection algorithm
Process mining is an important tool for analyzing and refining the process of companies. By analyzing the event log of a company, we can discover the process model of the company and further find the possible problems and bottlenecks in the company which decrease its performance. To increase the revenue of the company, we need to know the root causes of the problem in order to remove or modify them. This is done by applying a machine learning technique on the gathered data from the event log (and possibly other sources). But, there are infinitely many attributes that can be driven from a given event log. Some of these attributes are informative, some of them irrelevant, and some of them misleading. The selected set of features has a great impact on the performance of the machine learning algorithm (The “garbage in garbage out” Phenomena.). The process of choosing the best set of attributes among the given attributes regarding the given problem is called feature selection.
Feature selection is traditionally done by an expert who possesses the domain knowledge. But, the common trend in the area of machine learning is automatizing the process as much as possible. Automated feature selection is a challenging task and currently a topic of great research interest in the machine learning community. There are several techniques to achieve this aim which can be categorized as (i) filter-based, (ii) wrapper-based, (iii) embedded-based, (iv) online-based, (v) and hybrid-based.
In this bachelor thesis, we are aiming at implementing a hybrid method which has two phases. In the first phase, an initial set of attributes is chosen using statistical tests (like Pearson’s Correlation, LDA, ANOVA, Chi-Square, Spearman's rho, …).In the second phase, we are going to use a forward selection and then a backward selection technique to refine the selected set in the first phase. Finally, the user has to have the ability to modify the attribute set and evaluate it. We are going to implement this automated feature selection method in ProM framework which is a free and commonly used tool for process mining.
The steps of this bachelor thesis are:
• Implementation of the algorithm in Java,
• evaluation of the method by doing extensive experimentation,
• writing the thesis,
• giving an intermediate and final presentation.
Knowledge of basic computer science concepts, knowledge of basic statistics, programming skills (Java) and an interest in theoretical and practical aspects of machine learning and process mining.
Prof.dr.ir. Wil van der Aalst
Mahnaz Sadat Qafari
For more Information
Send an e-mail to Mahnaz Qafari. Make sure to include detailed information about your background and scores for completed courses.
Chair of Process and Data Science
Ahornstr. 55 (Eingang Mies-van-der-Rohe-Str.), Erweiterungsbau E2
Phone: +49 241 80 21 901