Differentially Private Trace Variant Selection for Process Mining
Student: Frederik Wangelik
Title: Differentially Private Trace Variant Selection for Process Mining
Supervisor: Majid Rafiei
1st Examiner: Prof. Wil M.P. van der Aalst
2nd Examiner: Prof. Ulrike Meyer
Summary
In the area of industrial process mining, privacy-preserving event data publication is becoming increasingly relevant. Consequently, the trade-off between high data utility and quantifiable privacy poses new challenges. State-of-the-art research mainly focuses on differentially private trace variant construction based on prefix expansion methods. However, these algorithms face several practical limitations such as high computational complexity, introducing fake variants, removing frequent variants, and a bounded variant length. In this thesis, we introduce two new approaches for differentially private trace variant release that mitigate the aforementioned restraints. Our first contribution leverages private partition selection strategies to directly perturb an entire collection of trace variants by noise injection and specific thresholding. Our second contribution represents a trainable generative model based on a private autoencoder and a private generative adversarial network. Combined both components allow to first adapt to an original variant distribution and then synthesize differentially private, similarly distributed clones of arbitrary size without data access. Experimental results on real-life event data and multiple levels of privacy show that our algorithms are superior compared to state-of-the-art methods both in terms of plain data utility and process discovery based evaluation metrics. Finally, we also briefly assess the computational complexity and provide implementation guidelines for production environments.