Article development led by
We need it, we can afford it,
and the time is now.
BY PAT HELLAND
THERE IS AN inexorable trend toward storing and
sending immutable data. We need immutability
to coordinate at a distance, and we can afford
immutability as storage gets cheaper. This article
offers an amuse-bouche of repeated patterns of
computing that leverage immutability. Climbing up
and down the compute stack really does yield a sense
of déjà vu all over again.
It was not that long ago that computation was
expensive, disk storage was expensive, DRAM
(dynamic random access memory) was expensive, but
coordination with latches was cheap. Now all these
have changed using cheap computation (with many-
core), cheap commodity disks, and cheap DRAM and
SSDs (solid-state drives), while coordination with
latches has become harder because
latch latency loses lots of instruction
opportunities. Keeping immutable
copies of lots of data is now affordable,
and one payoff is reduced coordination
Storage is increasing as the cost per
terabyte of disk keeps dropping. This
means a lot of data can be kept for a
long time. Distribution is increasing as more and more data and work
are spread across a great distance.
Data within a data center seems “far
away.” Data within a many-core chip
may seem “far away.” Ambiguity is
increasing when trying to coordinate
with systems that are far away—more
stuff has happened since you have
heard the news. Can you take action
with incomplete knowledge? Can you
wait for enough knowledge?
Turtles all the way down.
17 As various technological areas have evolved,
they have responded to these trends
of increasing storage, distribution,
and ambiguity by using immutable
data in some very fun ways. This article explores how apps use immutability in their ongoing work, how
they generate an immutable dataset
for later offline analysis, how SQL
can expose and process immutable
snapshots, and how massively parallel big-data work relies on immutable
datasets. This leads to looking at the
ways in which semantically immutable dataset may be altered while remaining immutable.
Next, the article considers how up-datability is layered atop the creation
of new immutable files via techniques
such as LSF (log-structured file system), COW (copy-on-write), and LSM
(log-structured merge-tree). How do
replicated and distributed file systems
depend on immutability to eliminate anomalies? Hardware folks have
joined the party by leveraging these
tricks in SSDs and HDDs (hard-disk
drives). Immutability is a key architectural concept at many layers of the
stack, as shown in Figure 1.
Finally, the article looks at some of
the trade-offs of using immutable data.