Technical
Perspective
Node Replication
Divides to Conquer
By Tim Harris
SHARED-MEMORY CONCURRENT DATA
structures are pervasive. They are used
explicitly in server software such as
in-memory databases and key-value
stores. Even when software is built
with a programing model based
around shared-nothing computation
or side-effect-free functional programming, we find concurrent data
structures in the heart of the implementation in the operating system,
the language runtime system, or the
garbage collector.
Designing these high-performance
concurrent data structures has long
been recognized as a difficult task.
Not only is it challenging to develop
an implementation that is correct,
but the underlying hardware is a
moving target; techniques that work
well on one system may work poorly
on another, and techniques that
work on today’s systems may work
poorly on tomorrow’s.
What better way to simplify this
task than to have an automatic technique to generate a concurrent data
structure from existing sequential
code? That is the goal set by Calciu
et al. in their work on Node Replication (NR). In the following article, they
show that not only can a concurrent
data structure be built automatically,
but that performance is actually competitive with state-of-the-art designs
for a series of important workloads.
NR achieves this good performance
by recognizing that the bottleneck for
many concurrent data structures is
the memory accesses that are made—
particularly when threads on different
sockets are “fighting” over the same
cache lines, or when threads on one
socket are accessing data stored on
a different socket. NR reduces these
costs by maintaining per-socket synchronized replicas of a data structure,
and routing a thread’s requests to its
own local replica.
This is a surprising and inspiring
result, particularly given that building “universal” constructions for concurrent data structures has been an
active research field for over 30 years.
Prior work has made important theoretical contributions, not the least of
which around the importance of instruction set support for atomic operations such as compare-and-swap.
However, NR takes that exploration
further both in reaching the point
where practical implementations
can perform well in some workloads,
and also in illustrating the benefits
of “mechanical sympathy” between
the techniques in the implementation and the physical structure of the
underlying machine.
An exciting implication of this paper
is it provides a division of responsibility: The implementer of a data structure
is responsible for its correctness and
for making it efficient in the absence
of concurrency. The implementer of
NR is responsible for building the replication and routing mechanisms efficiently for a particular machine; improvements to these mechanisms will
help any data structure using them.
For example, if a new machine has
specialized hardware for sending messages between sockets then that could
be used within NR without needing to
change data structures.
In the longer term, it is interesting to
think about broader applications of
the ideas used in NR. One is to support shared data structures on machines without hardware cache coherence—both at the scale of multiple
processors combined in a system-on-chip, and also in a distributed context
between separate machines with a
high-performance interconnect.
Tim Harris, Cambridge, U. K.
Copyright held by author/owner.
To view the accompanying paper,
visit doi.acm.org/10.1145/3282506
DOI: 10.1145/3282504
For further information
or to submit your
manuscript,
visit tsc.acm.org
ACM Transactions
on Social Computing
ACM TSC seeks to publish
work that covers the
full spectrum of social
computing including
theoretical, empirical,
systems, and design
research contributions.
TSC welcomes research
employing a wide range
of methods to advance
the tools, techniques,
understanding, and
practice of social
computing, particularly
research that designs,
implements or studies
systems that mediate
social interactions among
users, or that develops
theory or techniques
for application in those
systems.