practice
“What’s in a name? That which we call a rose by any
other name would smell as sweet.”
—William Shakespeare (Romeo and Juliet)
AS DISTRIBUTED SYSTEMS scale in size and heterogeneity,
increasingly identifiers connect them. These may be
called IDs, names, keys, numbers, URLs, file names,
references, UPCs (Universal Product Codes), and
many other terms. Frequently, these terms refer to
immutable things. At other times, they refer to stuff
that changes as time goes on. Identifiers are even
used to represent the nature of the computation
working across distrusting systems.
The fascinating thing about identifiers is that while
they identify the same “thing” over time, that referenced
thing may slide around in its meaning. Product
descriptions, reviews, and inventory balance all change,
while the product ID does not. Reservations, orders, and
bookings all have identifiers that do not change, while
the stuff they identify may subtly
change over time.
Identity and identifiers provide the
immutable linkage. Both sides of this
linkage may change, but they provide
a semantic consistency needed by the
business operation. No matter what
you call it, identity is the glue that
makes things stick and lubricates cooperative work.
This article is yet another thought
experiment and rumination about the
complex cacophony of intertwined
systems.
The Need for Identity
For a long time, we worked behind the
façade of a single centralized database.
Attempting to talk to other computers
was considered an “application problem” and not in the purview of the system. Data lived as values in cells in the
relational database. Everything could
be explained in simple abstractions,
and life was good!
Then, we started splitting up centralized systems for scale and manageabili-ty. We also tried to get different systems
that had been independently developed
to work together. That created many challenges in understanding each other4 and
ensuring predictable outcomes, especially for atomic transactions.
As time moved on, a number of usage patterns emerged that address the
challenges of work across both homogeneous and heterogeneous boundaries.
All of those patterns depend on connecting things with notions of identity. The
identities involved frequently remain
firm and intact over long periods of time.
Data on the outside vs. data on the inside. In 2005, I wrote a paper, “Data on
the Outside versus Data on the Inside,”
7
that explored what it means to have data
not kept in the SQL database but rather
kept in messages, files, documents,
and other representations. It turns out
that information not kept in databases
emerges as immutable messages, files,
values (à la key/values), or other representations. These are typically semi-structured in their representations, but
they always have some form of identifier.
Identity
by Any
Other Name
DOI: 10.1145/3303870
Article development led by
queue.acm.org
The complex cacophony
of intertwined systems.
BY PAT HELLAND