f = reinitiate request, g = pay compensation, and h = reject request. Hence,
trace acdeh models a reimbursement
request that was rejected after a registration, examination, check, and
decision step; 455 cases followed this
path, which consists of five steps, so
the first line in the table corresponds
to 455 × 5 = 2,275 events. The whole
log consists of 7,539 events.
Process-discovery techniques produce process models based on event
logs (such as the one in Figure 2); for
example, the classical α-algorithm
produces model M1 for this log. This
process model is represented as a
Petri net consisting of places and
transitions. The state of a Petri net,
or “marking,” is defined by the distribution of tokens over places. A transition is enabled if each of its input
places contains a token; for example,
a is enabled in the initial marking of
M1, because the only input place of
a contains a token (black dot). Transition e in M1 is enabled only if both
input places contain a token. An enabled transition may fire, thereby
consuming a token from each of its
input places and producing a token
for each of its output places. Firing a
in the initial marking corresponds to
removing one token from start and
producing two tokens, one for each
output place. After firing a, three transitions—b, c, and d—are enabled. Firing b disables c because the token is
removed from the shared input place
(and vice versa). Transition d is concurrent with b and c; that is, it can fire
without disabling another transition.
Transition e becomes enabled after d
and b or c have occurred. By executing
e, three transitions—f, g, and h—
become enabled; these transitions are
competing for the same token, thus
modeling a choice. When g or h is
fired, the process ends with a token
in place end. If f is fired, the process
returns to the state just after executing a. Note that transition d is concurrent with b and c. Process mining
techniques must be able to discover
such advanced process patterns and
should not be restricted to simple sequential processes.
Checking that all traces in the event
log can be reproduced by M1 is easy.
The same does not hold for the second
process model in Figure 2, as M2 is able
to reproduce only the most frequent
trace acdeh. The model does not fit the
log well because observed traces (such
as abdeg) are not possible according
to M2. The third model is able to re-
produce the entire event log, but M3
also allows for traces (such as ah and
adddddddg). M3 is therefore considered
“underfitting”; too much behavior is
allowed because M3 clearly overgener-
alizes the observed behavior. Model M4
is also able to reproduce the event log,
though the model simply encodes the
example traces in the log; we call such a
model “overfitting,” as the model does
not generalize behavior beyond the ob-
served examples.
and 1 =
a d c e g
a d c e g
2 =
a b ; e f d ;e g
abd efdb eg
3 =
ab e f d eg
a b ;;d e g
and
1 =
a d c e g
a d c e g
2 =
a b ; e f d ;e g
abd efdb eg
3 =
ab e f d eg
a b ;;d e g
Conformance Checking
Process mining is not limited to process discovery; the discovered process
is just the starting point for deeper
analysis. Conformance checking and
enhancement relate model and log, as
in Figure 1. The model may have been
made by hand or discovered through
process discovery. In conformance
checking, the modeled behavior and
the observed behavior, or event log,
are compared. When checking the
conformance of M2 with respect to
the log in Figure 2, only the 455 cases
following acdeh can be replayed from
beginning to end. If the model would
try to replay trace acdeg, it would get
stuck after executing acde because g
is not enabled. If it would try to replay
trace adceh, it would get stuck after executing the first step because d is not
(yet) enabled.
Among the approaches to diagnosing and quantifying conformance is
one that looks to find an optimal alignment between each trace in the log
and the most similar behavior in the
model. Consider, for example, process
model M1, a fitting trace σ1 = adceg, a
non-fitting trace σ2 = abefdeg, and the
following three alignments:
1 =
a d c e g
a d c e g
2 =
a b ; e f d ;e g
abd efdb eg
γ1 shows perfect alignment between
σ1 and M1; all moves of the trace in
the event log (top part of alignment)
can be followed by moves of the model
(bottom part of alignment). γ2 shows
an optimal alignment for trace σ2 in
the event log and model M1; the first
two moves of the trace in the event log
can be followed by the model. However, e is not enabled after executing
only a and b. In the third position of
alignment γ2, a d move of the model
is not synchronized with a move in the
event log. This move in just the model
is denoted as ( ,d), signaling a conformance problem. In the next three
moves model and log agree. The seventh position of alignment γ2 involves
a move in the model that is not also in
the log: ( ,b). γ3 shows another optimal alignment for trace σ2. In γ3 there
are two situations where log and model do not move together: (e, ) and
(f, ). Alignments γ2 and γ3 are both
optimal if the penalties for “move
in log” and “move in model” are the
same. Both alignments have two
steps, and no alignments are possible
with fewer than two steps.
Conformance may be viewed from
two angles: either the model does not
capture real behavior (the model is
wrong) or reality deviates from the desired model (the event log is wrong).
The first is taken when the model is
supposed to be descriptive, or captures or predicts reality; the second is
taken when the model is normative, or
used to influence or control reality.
Various types of conformance
are available, and creating an alignment between log and model is just
the starting point for conformance
checking. 1 For example, various fitness (the ability to replay) metrics are
available for determining the conformance of a business process model;
a model has fitness 1 if all traces can
be replayed from begin to end, and a
model has fitness 0 if model and event
log “disagree” on all events. In Figure 2, process models M1, M3, and M4
=
abe f deg