(Un)Fair Event Logs
Author: Timo Pohl
License: CC-BY-4.0
Event Logs:
We introduce a set of 12 distinct event logs, three for each of the four domains: hiring, healthcare, lending, and renting. These event logs have been carefully curated and simulated, each containing 10,000 cases, thereby providing an extensive resource for researchers focusing on fairness in process mining.
In each of these domains, the three event logs represent varying degrees of discrimination, offering researchers an opportunity to explore the nuances and complexities that arise in diverse real-world scenarios. By presenting each log with a thorough description of the inherent processes and their respective attributes, we aim to provide a robust groundwork for understanding the potential sources of discrimination and addressing fairness in process mining.
We have ensured that all the event logs are provided in the eXtensible Event Stream (XES) standard format. This adherence to a recognized standard not only ensures broad compatibility but also facilitates interoperability across a variety of process mining tools. By choosing this common format, we aim to encourage and simplify the utilization of these logs for researchers across different platforms.
* Hiring
The data describes a multifaceted recruitment process with diverse application pathways ranging from minimal processing to extensive multi-step procedures. The variability of these routes, largely dependent on numerous determinants, yields a spectrum of outcomes from instant rejection to successful job offers.
The logs include attributes such as age, citizenship, German proficiency, gender, religion, and years of education. While these attributes may inform candidate profiles, their misuse could engender discrimination. Variables like age and education may signify experience and skills, citizenship and German language may address job logistics, but these should not unjustly eliminate applicants. Gender and religion, unrelated to job performance, must not sway hiring. Therefore, the use of these attributes must uphold fairness, avoiding any potential bias.
* Hospital
The data depicts a hospital treatment process that commences with registration at an Emergency Room or Family Department and advances through stages of examination, diagnosis, and treatment. Notably, unsuccessful treatments often entail repetitive diagnostic and treatment cycles, underscoring the iterative nature of healthcare provision.
The logs incorporate patient attributes such as age, underlying condition, citizenship, German language proficiency, gender, and private insurance. These attributes, influencing the treatment process, may unveil potential discrimination. Factors like age and condition might affect case complexity and treatment path, while citizenship may highlight healthcare access disparities. German proficiency can impact provider-patient communication, thus affecting care quality. Gender could spotlight potential health disparities, while insurance status might indicate socio-economic influences on care quality or timeliness. Therefore, a comprehensive examination of these attributes vis-a-vis the treatment process could shed light on potential biases or disparities, fostering fairness in healthcare delivery.
* Lending
This data illustrates the steps within a loan application process. From an initial appointment request, the process navigates various stages, including information verification and underwriting, culminating in loan approval or denial. Additional steps may be required, such as co-signer enlistment or collateral assessment. Some cases experience outright appointment denial, indicating the process's variability, reflecting applicants' differing credit situations.
The logs' attributes can aid in identifying influences on outcomes and detecting discrimination. Personal characteristics ('age', 'citizen', 'German speaking', and 'gender') and socio-economic indicators ('YearsOfEducation' and 'CreditScore') can impact the process. While 'yearsOfEducation' and 'CreditScore' can validly inform creditworthiness, 'age', 'citizen', 'language ability', and 'gender' should not bias loan decisions, ensuring these attributes are used responsibly fosters equitable loan processes.
* Renting
The data represents a rental process. It begins with a prospective tenant applying to view a property. Subsequent steps include an initial screening phase, viewing, decision-making, and a potential extensive screening. The process ends with the acceptance or rejection of the prospective tenant. In some cases, a tenant may apply for viewing but be rejected without the viewing occurring.
The logs contain attributes that can shed light on potential biases in the process. 'Age', 'citizen', 'German speaking', 'gender', 'religious affiliation', and 'yearsOfEducation' might influence the rental process, leading to potential discrimination. While some attributes may provide useful insights into a potential tenant's reliability, misuse could result in discrimination. Thus, fairness must be observed in utilizing these attributes to avoid potential biases and ensure equitable treatment.
External Links
- hiring_log_high.xes (gz: 942 kb)
- hiring_log_medium.xes (gz: 1008 kb)
- hiring_log_low.xes (gz: 1036 kb)
- hospital_log_high.xes (gz: 1054 kb)
- hospital_log_medium.xes (gz: 1058 kb)
- hospital_log_low.xes (gz: 1044 kb)
- lending_log_high.xes (gz: 972 kb)
- lending_log_medium.xes (gz: 966 kb)
- lending_log_low.xes (gz: 984 kb)
- renting_log_high.xes (gz: 1463 kb)
- renting_log_medium.xes (gz: 1663 kb)
- renting_log_low.xes (gz: 1548 kb)