Joint Master Thesis RWTH PADS/Canon CPP (Venlo; NL) Applying Process Mining on (Live) Large-Scale Printing Data
Canon Production Printing (CPP) is a worldwide leader in the design and manufacturing of large-scale production printers. In this context, the printers produced by CPP are used by commercial printing companies in, e.g., marketing and direct mail, to execute large-scale print-jobs. Hence, the printers are designed to achieve a huge output in terms of numbers of printed entities per minute. CPP has its headquarters in Venlo, the Netherlands (1-hour drive from Aachen), and has several offices throughout the world.
A printer produced by CPP is a complex system comprising of several different interacting hardware and software components. All these components interact with each other, and, fragments of these interactions are recorded in execution logs. This is where process mining comes in, i.e., the field of process mining is concerned with the study and analysis of the data stored in such execution logs. The main goal of process mining is to improve one’s knowledge of the processes under study, by analyzing their execution, as captured in the data. In this way, one gets insights into the process based on data, i.e., describing what actually happened.
Within process mining, we identify three main topics. Firstly, in process discovery, we aim to discover a process model that accurately describes the process, solely based on the data observed in an execution log. The main challenge of process discovery is to discover process models that are human-interpretable, i.e. in some way simple, yet at the same time general enough in describing the data.
Secondly, in conformance checking, we aim to assess to what degree a given process model and event data correspond to each other. Observe that, conformance checking allows us to identify whether the process is effectively executed as intended. An interesting application is the use of conformance checking to find anomalies in the data that could help in the diagnosis of problems with the printer in the field.
Finally, in process enhancement, we aim to enhance the overall model we have of the process, i.e. by computing where bottlenecks occur, finding out what data elements determine decision points in the process, prediction of remaining process execution time, etc. However, whereas the aforementioned core functionalities of process mining are clear cut, and, work when the ‘data is nice’, CPP’s production data isn’t ‘nice’. Since the initial design of the execution logs was made without having process mining in mind, several interesting challenges surface in the execution data. For example, not every event tracked during the printing process is tracked at the same level of data.
Hence, in this M.Sc. thesis project, which is a joint project of Canon Production Printing and the RWTH PADS chair, the student is asked to investigate the application of process mining techniques, in the context of the logging data logged by Canon’s printing machines in the field. Initially, a complex data set, containing a huge amount of executed process behaviour, needs to be analysed. It is likely that, as a first step, the data needs to be translated to the right level of abstraction, to come to meaningful conclusions. Based on the initial analysis, the student, in cooperation with the supervision team (both from Canon and RWTH), will identify, investigate, solve and generalize the relevant problems apparent in the captured data, i.e., allowing for the application of process mining on the logged data!
The M.Sc. project will be partly executed online, partly at the PADS chair, and partly at Canon (if COVID Permits) office in Venlo. The student will be involved in all steps of a typical data-oriented research project. This project is a unique opportunity to work on real data, and, get direct exposure to the industry. Moreover, likely, the tools and techniques developed in the context of this M.Sc. thesis work will be adopted in practice! Finally, note that, after your application at RWTH (through Dr. van Zelst), a separate interview with Canon will be scheduled.
Knowledge of basic computer science concepts, good programming skills and an interest in theoretical and practical aspects of process mining is recommended.
- Process Mining Book
- Coursera Process Mining Course
- PM^2: a Process Mining Project Methodology (Paper)
Prof.dr.ir. Wil van der Aalst
Dr.ir. Sebastiaan van Zelst
Dr. Peter Kruizinga
Send an e-mail to Dr.ir. Sebastiaan van Zelst. Make sure to include a C.V., detailed information about your background and scores for completed courses, i.e., in particular courses followed at i9 – Process and Data Science.