September 2013/2014.
FACULTY OF SCIENCE AND TECHNOLOGY UNIVERSITY PARIS XII
SECOND MASTER SCIENCE FOR ENGINEERING
SPECIALTY:
COMPLEX SYSTEMS, INFORMATION TECHNOLOGY AND CONTROL.
Theme
COMPARISON OF PROCESS MINING TECHNIQUES APPLICATION
TO FLEXIBLE AND UNSTRUCTURED PROCESSES
Presented by:
M$ ARIOUAT
Hanane
Supervisor: Pr BARKAOUI
kamal Co-Supervisor: Pr JACKY
Akoka
1
Abstract
Process Mining refers to the extraction of process models from
event logs. There are many process mining algorithms with different theoretical
foundations and aims, raising the question of how to choose the best for a
particular situation. A framework is proposed for objectively comparing
algorithms for process discovery against a known ground truth, with an
implementation using existing tools. Real-life processes tend to be less
structured and more flexible. Traditional process mining algorithms have
problems dealing with such unstructured processes and generate spaghetti-like
process models that are hard to comprehend. To address these problems, we use
an approach where the event log is clustered iteratively such that each of the
resulting clusters corresponds to a coherent set of cases that can be
adequately represented by a process model. In this thesis we present an
approach to trace clustering. We rely on the traces profiles, earlier proposed,
and adjust it to our purpose by considering only the activities in a case. We
propose a new distance measure by using the Logical operator XOR and a new
technique to calculate new cluster centers by using the logical operator AND.
We have implemented and experimented our approach using real life event logs.
Using our, provide homogeneous subsets, and for each subset a better process
model is created using the ProM framework. We evaluate the goodness of the
formed clusters using established fitness and comprehensibility metrics defined
in the context of process mining. The proposed approach is able to generate
clusters such that the process models mined from the clustered traces show a
high degree of fitness and comprehensibility when compared to contemporary
approaches [32].
i
Contents
Table of Content i
List of Figures vii
List of Tables ix
Introduction 1
1 Business Process and Process Mining 4
1.1 Introduction 4
1.2 Business Processes 4
1.2.1 Definitions 5
1.2.2 Business Process Management 6
1.2.3 BPM life cycle 6
1.2.4 Business processes modeling languages 7
1.2.4.1 Petri Nets 8
1.2.4.2 Workflow Nets 10
1.2.4.3 Causal Nets (C-Nets) 11
1.2.4.4 BPMN 12
1.3 Process Mining 14
1.3.1 Event logs 15
1.3.2 Log filtering 16
1.3.3 Process Mining Perspectives 17
1.3.4 Process Mining as Control-Flow Discovery 18
1.3.4.1 The á-algorithm 20
1.3.4.2 The á+-algorithm 23
1.3.4.3 The Heuristics algorithm 24
1.3.4.4 The Genetic algorithm 25
1.4 Evaluation metrics of Business Processes 27
1.4.1 Performance of a Process Mining Algorithm 27
1.4.2 Evaluating the discovered process 27
1.4.2.1 Model-to-log Metrics 29
1.4.2.2 Model-to-model Metrics 30
1.5 Conclusion 31
2 Process Mining Tools 32
2.1 Introduction 32
2.2 ProM Architecture 32
2.3 Log Filters 34
2.3.1 Adding artificial 'start' and 'end' events 35
2.3.2 Duplicate Task filter 37
2.3.3 Remove attributes with empty value 37
2.3.4 Enhanced Event Log filter 38
2.3.5 Time based log filter 39
2.4 Mining Tools 39
2.4.1 á-algorithm plug-in 39
2.4.2 Tshinghua-á-algorithm plug-in 40
2.4.3 Heuristics miner plug-in 40
2.4.4 Genetic algorithm plug-in 41
2.4.5 Social Network miner plug-in 42
2.4.6 Organizational Miner plug-in 43
2.4.7 Staff Assignment Miner plug-in 43
2.4.8 Decision Miner plug-in 43
|
|
2.4.9 Fuzzy Miner plug-in
|
43
|
|
2.5
|
Analysis Tools
|
44
|
|
|
2.5.1 Conformance checker
|
44
|
|
|
2.5.2 Woflan plug-in ( A Petri-net-based Workflow Analyzer)
|
46
|
|
|
2.5.3 Performance analysis
|
46
|
|
|
2.5.4 LTL checker
|
47
|
|
2.6
|
Conversion Plug-ins
|
48
|
|
|
2.6.1 BPMN to Petri-Net
|
48
|
|
|
2.6.2 Petri Net to Yawl Model
|
49
|
|
|
2.6.3 Petri Net into WF-Net
|
49
|
|
|
2.6.4 YAWL model into EPC
|
50
|
|
2.7
|
Conclusion
|
51
|
3
|
Comparison and analysis of process mining
algorithms
|
52
|
|
3.1
|
Introduction
|
52
|
|
3.2
|
Problem Description
|
52
|
|
3.3
|
Comparison and analysis
|
53
|
|
|
3.3.1 Log description
|
53
|
|
|
3.3.2 Experimentation
|
55
|
|
|
3.3.2.1 Fitness
|
55
|
|
|
3.3.2.2 Precision and Generality
|
57
|
|
|
3.3.2.3 Precision and Recall
|
61
|
|
3.4
|
Conclusion
|
63
|
4
|
Proposition
|
64
|
|
4.1
|
Introduction
|
64
|
|
4.2
|
Proposition description
|
64
|
|
|
4.2.1 Trace Profiles
|
66
|
|
|
4.2.2 XOR Operator
|
67
|
|
|
4.2.3 AND Operator
|
67
|
|
|
4.2.4 Clustering Algorithm
|
68
|
4.2.4.1 Distance Measure 69
4.2.4.2 Calculate new clusters 69
4.3 Running Examples 69
4.4 Discussion 72
4.5 Conclusion 73
Conclusion and Future Works 74
Bibiography 76
v
List of Figures
1.1 Different aspects of a business process [54] 6
1.2 BPM life cycle [15] 7
1.3 A marked Petri net. 9
1.4 Petri Nets in two different states. 10 1.5 Some basic
workflow templates that can be modeled using Petri Net notation. 11
1.6 A causal net example. 12
1.7 Example of some basic components, used to model a business
process using
BPMN notation. 13
1.8 A Petri Net and a BPMN model that exhibit similar set of
traces, annotated
with some control-flow patterns that exist in both models.
14
1.9 From event logs to models via process mining 15
1.10 Some mining results for the process perspective (a) and
organizational(b,c). 18
1.11 Illustration of the main control-flow patterns. 20
1.12 Four process where different dimensions are pointed out
(inspired by Fig. 2 of [28]). The (a) model represents the
original process, that generates the log of Table 2.2; in this
case all the dimensions are correctly highlighted; (b) is a model with a low
fitness; (c) has low precision and (d) has low
generalization and structure 29
2.1 Overview of the ProM Framework. (adapted from [41]) 33
2.2 Log filter Screenshot (In ProM) 34
2.3 Process model using the Fuzzy Miner on a non filtered event
log. 35
2.4 Process model after filtering the event log 36
2.5 Process model using the heuristic miner before filtering.
36
2.6 Process model using the heuristic miner after filtering.
37
2.7 Duplicate Task filter 37
2.8 Removing attributes with empty value 38
2.9 Enhanced Event Log filter 38
2.10 Time based log filter 39
2.11 alpha miner 40
2.12 Tshinghua alpha miner. 40
2.13 Process Model of the example log 41
2.14 Process Model of the example log with genetic miner
42
2.15 Descovering social networks by ProM. 42
2.16 Model view shows places in the model where problems
occurred during the
log replay. 45
2.17 Analysis of the precision of a model allows to detect
overgeneral parts. . 45
2.18 Structural analysis detects duplicate task that list
alternative behavior and
redundant tasks. 46
2.19 The correctness of a model using Woflan tool. 46
2.20 Scheenshot of the analysis plug-in Performance Analysis
with Petri net. . 47
2.21 Scheenshot of the analysis plug-in LTL Checker Plugin
(Example raining)
[41] 47
2.22 BPMN example 48
2.23 Petri net translation of the BPMN in Figure 2.22 49
2.24 The mined review process model converted to a YAWL model
49
3.1 The processes within Rabobank ICT 54
3.2 Fitness against the number of traces before filtering
56
3.3 Fitness against the number of traces after filtering
57
3.4 Mined model by Alpha algorithm 59
3.5 Mined model by Genetic algorithm 59
3.6 Mined model by Heuristic miner 60
3.7 Mined model by Alpha algorithm case 2 60
vii
List of Tables
3.8 Mined model by Genetic algorithm case 2 61
3.9 Mined model by Heuristic miner case 2 61
3.10 Average Structural precision and Behavioral precision
values 62
3.11 Average Structural recall and Behavioral recall values
63
4.1 Our clustering approach overview 66
viii
List of Tables
1.1
|
An event log (audit trail).
|
16
|
1.2
|
Example of log traces, generated from the executions of the
process pre-
|
|
|
sented in (Figure 1.12 (a))
|
28
|
2.1
|
Example of an event log with 3 process instances, for the
process of patient care in a hospital (A: Register patient, B: Contact family
doctor, C: Treat
|
|
|
patient, D: Give prescription to the patient, E: Discharge
patient)
|
41
|
3.1
|
Extract of the event log
|
54
|
3.2
|
A fragment of the second event log: each line corresponds to an
event. . . .
|
55
|
3.3
|
Conformance checker between the first example and its mined
models . . .
|
60
|
3.4
|
Conformance checker between IM0047050 case and its mined models.
. . .
|
61
|
4.1
|
Interpretation of XOR results
|
67
|
4.2
|
Interpretation of AND results
|
68
|
4.3
|
Example process logs (A: Receive a item and repair request, B:
Check the item, C: Check the warranty, D: Notify the customer, E: Repair the
item, F: Test the repaired product, G: Issue payment, H: Send the
cancellation
|
|
|
letter, I: Return the item)
|
70
|
4.4
|
traces profiles for the example log from 4.3
|
70
|
4.5
|
Metrics values
|
70
|
4.6
|
Metrics values
|
71
|
4.7
|
Event logs form [39]
|
71
|
4.8
|
Metrics values for the second event log
|
71
|
ix
List of Tables
4.9 Metrics values 71
4.10 Metrics values 72
1
|