“E-commerce has
redefined crime.
We now see new
breeds of online
crime where
technologically savvy
criminals exploit not
only the weaknesses
of human nature,
but also the systems
originally designed
to protect online
shoppers.”
in the online auction, dividing them
into two groups (“fraudsters” and “
accomplices”). The fraudsters rarely trade
among themselves, and neither do the
accomplices. Then, the bad guy uses
the accomplices to artificially boost
the reputation of the fraudsters. The
accomplices typically act like normal,
honest users who buy and sell items
(usually cheap, to lower operating
costs), but they sometimes sell expensive items to the fraudsters, leaving
glowing comments about how the buyers ( fraudsters) are good guys (“paid on
time”, “easy to communicate with”,
etc.). After the fraudsters have reached
high reputation, they launch deceptive
auctions to sell expensive items (e.g.,
big-screen TVs), usually at bargain
prices, to the victims (honest people).
Those items will never be delivered.
We call the above interaction pattern a “bipartite core” (Figure 1), where
two types of nodes ( fraudsters and
accomplices) only interact with nodes of
different types, but not with their own.
This pattern forms the infrastructure
that criminals set up before they carry
out auction fraud. But to the naked eye,
the associations bet ween the identities
involved in the deceptive bipartite core
pattern might not be apparent.
The NetProbe system [ 1] was developed to dig out these identities in this
pattern, by automatically scanning
connections between buyers and sellers, several layers deep, to look for ar-
tificial feedback, revealing identities
and their associations that match the
bipartite core. The system ran through
over one million transactions and correctly picked out dozens of previously
identified criminals; it also identified
tens of probable fraudsters and apparent accomplices.
Under the hood, NetProbe uses an
inference algorithm called Belief Propagation to infer which nodes in the auction graph are most likely to be
fraudsters and accomplices. The system first
uses heuristics to assign a vector of
three probabilities—called the node’s
belief—to each node: a fraudster probability, an accomplice probability, and
an honest probability. For example,
if an identity has been active for many
years and has not received any negative
comments from other people, then that
identity has a high honest probability;
if an account was recently shut down
right after it received many complaints,
then it has a high fraudster probability.
These three probabilities sum up to 1.
Table 1. Conditional probability table
describing a “bipartite core”; ε is
a small constant close to zero. For
example, entry (F, A), with a value of
1-2ε, describes a very high probability
of a node’s neighbor being an accom-
plice (A) given the node itself being a
fraudster (F).
Fraudster (F)
Accomplice (A)
Honest (H)
F
ε
0.5
ε
A
1-2ε
2ε
(1-ε)/2
H
ε
0.5-2ε
(1-ε)/2
NetProbe’s algorithm then uses the
matrix in Table 1 to transform each
node’s belief into a message (also a
probability vector) that the node will
send to each of its neighbors; the message represents what the node thinks
about its neighbors. The transformation is similar to multiplying the node’s
belief with the matrix. For example, if
a node has high fraudster probability,
then applying the transformation on
it will create a message for each neighbor that says the neighbor is likely an
accomplice. All nodes simultaneously
send out messages to their neighbors.
Each node gathers its incoming messages, multiplies them into one vector
(which also resolves competing messages similar to majority voting), then
sets that vector as the node’s new belief. Finally, the node generates new
messages for its neighbors using its
updated belief. This whole process
continues until all node beliefs do not
change anymore. NetProbe then calls
out the likely fraudsters and accomplices, and warn off potential bidders.
The idea of propagating information across a graph and aggregating
it to produce high-level conclusion is
powerful. It inspired the creation of
the generalized Snare system [ 2] applicable for various kinds of fraud and
anomaly detection tasks. Snare was
used on some general ledger data (a
network of interconnected accounts)
to detect financial fraud, boosting the
detection rates of misstated accounts
by 5. 5 times.
USER-CEN TERED AND AUTOMATIC
PAT TERN DE TEC TION
Sometimes, analysts need to experiment with multiple patterns that,
hopefully, would match the actual incriminating patterns. Creating a separate algorithm for each such pattern is
costly and time-consuming, especially
since most patterns will end up not
being useful. Can we provide one tool
that detects a wide range of patterns
quickly and easily?
The Graphite system [ 3] aims to meet
this challenge. It provides a direct-ma-nipulation user interface for the user to
construct the query pattern by placing
nodes on the screen, assigning types to
them, and connecting the nodes with
edges. For example, the query pattern
in Figure 2 asks for money laundering
rings of alternating businessmen and
bankers. Graphite then locates the pattern’s exact and approximate matches
in a large graph of the user’s choosing.
Graphite advances over existing algorithms that detect only structural patterns without considering the types of
the nodes that compose the patterns;
it enables more specific patterns to be
found. Consider a communication network where each node is a person from
a country (country is the node type).
Our analyst Laura wants to locate four