sociated attributes. Also, credit-card
holders are objects with associated attributes. Pattern-matching fraudulent
activity from other credit cards to this
card can give early warning. Without
this matching to find new identities,
e-commerce would be very challenging
because of the amount of fraud that
would get through.
Homeland security is finding identity.
Another example of identities and
matching comes from looking at patterns of travel, locations, payment
types, and more. It is not unusual for
an analysis of many travelers to result
in similar behavior by ostensibly different people. By realizing they have the
same identity, the details known about
the different people can be coalesced
to gain a better understanding of the
risks they may pose.
Laser-sharp vision and blurring details.
This coalescing of identities based
upon common attributes is the basis
for many of the emerging use cases in
data science. One perspective is that
the set of attributes defines the identity
that results from the coalescing. Must
the attributes be a match in all their
full glory? What makes it OK to have
differences? Do we want laser-sharp
exactitude in the attribute matching,
or is it OK to squint a little bit and blur
some details to allow more matches?
Increasingly, the original data (for
example, merchant feeds with product info) is kept and linked to the normalized, matched, and sanitized data.
These operations are intrinsically lossy
as you strive for commonality with other inputs. Considering the aligned and
sanitized common view and comparing it with the individual raw feeds can
offer additional insight.
Using Identity for Activities
Activities are long-running work across
time and across computers, and may
run across trust boundaries, departments, and companies. An activity is
usually handled by having an identifier
for the activity and separate identifiers
for each step.
Long-running workflow runs and iden-
tifiers. Long-running workflow runs with
messages across time and typically waits
for external actions to complete. As ex-
ternal events are initiated, somehow an
identifier for the event is received when
it completes. To deal with an external
computer, the identifier is usually tied
to outgoing and incoming messages.
Identifiers crossing trust boundaries.
Sometimes an activity crosses trust
boundaries. Sending messages across
companies in a B2B solution opens up
trust concerns—perhaps sending mes-
sages across departments or even from
a Linux box to a Windows box. Each
of these solutions offers challenges.
The work in these cases is invariably
knit together with some form of iden-
tity. That identity must have a scope
in space that covers all the distrusting
participants and a scope in time cover-
ing the duration of the work.
It is not uncommon for one system
to provide an alias for its identifiers.
Messages going out and in are trans-
lated between the two identity systems.
Bank check numbers and idempo-
tence. An example of an identifier for
long-running work is the check num-
ber on the printed checks from your
bank. When you make a paper check
out to the electric company or the
grocery store, the check has a unique
identifier. On the bottom of the check
are three series of numbers: the ABA
(American Banking Association) rout-
ing number, account number, and
check number. The ABA routing num-
ber uniquely identifies your bank.
The account number identifies your
account within the bank. Finally, the
check number is unique within your
account.
When your check is handed over to
your grocery store, it is deposited in
the store’s bank, not yours. That bank
then records the deposit along with the
numbers from your check. The grocery
store’s bank then forwards the check
to your bank, which records the debit
and sends money back to the grocery
store’s bank.
Because of the unique identifier on the
check, your bank and the grocery store’s
bank can implement algorithms to en-
sure the exactly-once processing of the
debit and credit. This has been going
on for many years, longer than we have
had computers.
Identifiers: The glue that binds and
splits. Identifiers are the glue that con-
nects work. It’s the ability to connect
the work that allows us to split apart
our scaling solutions and to connect
previously disconnected solutions.
REST: URL-ey binding. Representa-
The judicious use
of ambiguity and
interchangeability
lubricates
distributed,
long-running,
scalable, and
heterogeneous
systems.