mining can be seen as taking X-rays to
help diagnose/predict problems and
recommend treatment.
An important driver for process
mining is the incredible growth of
event data4, 5 in any context—sector,
economy, organization, and home—
and system that logs events. For less
than $600, one can buy, say, a disk
drive with the capacity to store all of
the world’s music. 5 A 2011 study by
Hilbert and Lopez4 found that storage space worldwide grew from 2. 6
optimally compressed exabytes ( 2. 6 ×
1018B) in 1986 to 295 compressed exabytes in 2007. In 2007, 94% of all information storage capacity on Earth was
digital, with the other 6% in the form
of books, magazines, and other non-digital formats; in 1986, only 0.8% of
all information-storage capacity was
digital. These numbers reflect the continuing exponential growth of data.
The further adoption of technologies (such as radio frequency identification, location-based services, cloud
computing, and sensor networks) will
accelerate the growth of event data.
However, organizations have problems using it effectively, with most
still diagnosing problems based on
fiction (such as PowerPoint slides and
Visio diagrams) rather than on facts
(such as event data). This is illustrated
by the poor quality of process models
in practice; for example, over 20% of
the 604 process diagrams in SAP’s reference model have obvious errors and
their relation to actual business processes supported by SAP is unclear. 6 It
is thus vital to turn the world’s massive amount of event data into relevant
knowledge and reliable insights—and
this is where process mining can help.
The growing maturity of process
mining is illustrated by the Process
Mining Manifesto9 released earlier
this year by the IEEE Task Force on
Process Mining (http://www.win.tue.
nl/ieeetfpm/) supported by 53 organizations and based on contributions
from 77 process-mining experts. The
active contributions from end users,
tool vendors, consultants, analysts,
Figure 1. the three basic types of process mining in terms of input and output.
event log
discovery
model
event log
conformance
checking
diagnostics
model
event log
enhancement
new model
model
and researchers highlight the significance of process mining as a bridge
between data mining and business
process modeling.
The starting point for process mining is an event log in which each event
refers to an activity, or well-defined
step in some process, and is related to
a particular case, or process instance.
The events belonging to a case are ordered and can be viewed as one “run”
of the process. Event logs may also
store additional information about
events; when possible, process mining techniques use extra information
(such as the resource, person, or device executing or initiating the activity), the timestamp of the event, and
data elements recorded with the event
(such as the size of an order).
Event logs can be used to conduct
three types of process mining (see Fig-
ure 1). 1 The first and most prominent
is discovery; a discovery technique
takes an event log and produces a
model without using a priori informa-
tion. For many organizations it is sur-
prising that existing techniques are
able to discover real processes based
only on example behaviors recorded
in event logs. The second type is con-
formance, where an existing process
model is compared with an event log
of the same process. Conformance
checking can be used to check if real-
ity, as recorded in the log, conforms
to the model and vice versa. The third
type is enhancement, where the idea
is to extend or improve an existing
process model using information
about the actual process recorded in
an event log. Whereas conformance
checking measures alignment be-
tween model and reality, this third
type of process mining aims to change
or extend the a priori model; for in-
stance, using timestamps in the event
log, one can extend the model to show
bottlenecks, service levels, through-
put times, and frequencies.
Process Discovery
The goal of process discovery is to
learn a model based on an event log.
Events can have all kinds of attributes
(such as timestamps, transactional
information, and resource usage) that
can be used for process discovery.
However, for simplicity, we often represent events by activity names only.
That way, a case, or process instance,
can be represented by a trace describing a sequence of activities. Consider,
for example, the event log in Figure 2
(from van der Aalst1), which contains
1,391 cases, or instances of some reimbursement process. There are 455
process instances following trace
acdeh, with each activity represented by
a single character: a = register request,
b = examine thoroughly, c = examine
casually, d = check ticket, e = decide,