DOi: 10.1145/1516046.1516068
Two Hardware-Based
Approaches for Deterministic
Multiprocessor Replay
By Derek R. Hower, Pablo Montesinos, Luis Ceze, Mark D. Hill, and Josep Torrellas
abstract
Many shared-memory multithreaded executions behave
nondeterministically when run on multiprocessor hardware
such as emerging multicore systems. Recording nondeterministic events in such executions can enable deterministic
replay—e.g., for debugging. Most challenging to record are
memory races that can potentially occur on almost all memory references. For this reason, researchers have previously
proposed hardware to record key memory race interactions
among threads.
The two research groups coauthoring this paper independently uncovered a dual approach: focus on recording
how long threads execute without interacting. From this
common insight, the groups developed two significantly
different hardware proposals. Wisconsin Rerun makes few
changes to standard multicore hardware, while Illinois
DeLorean promises much smaller log sizes and higher
replay speeds. By presenting both proposals in one paper,
we seek to illuminate the promise of the joint insight and
inspire future designs.
1. intRODuCtiOn
Modern computer systems are inherently nondeterministic due to a variety of events that occur during an execution,
including I/O, interrupts, and DMA fills. The lack of repeatability that arises from this nondeterminism can make it difficult to develop and maintain correct software. Furthermore, it
is likely that the impact of nondeterminism will only increase
in the coming years, as commodity systems are now shared-memory multiprocessors. Such systems are not only impacted
by the sources of nondeterminism in uniprocessors, but also
by the outcome of memory races among concurrent threads.
In an effort to help ease the pain of developing software
in a nondeterministic environment, researchers have proposed adding deterministic replay capabilities to computer
systems. A system with a deterministic replay capability can
record sufficient information during an execution to enable
a replayer to (later) create an equivalent execution despite
the inherent sources of nondeterminism that exist. With the
ability to replay an execution verbatim, many new applications may be possible:
Debugging: Deterministic replay could be used to provide
the illusion of a time-travel debugger that has the ability
to selectively execute both forward and backward in
time.
security: Deterministic replay could also be used to enhance
the security of software by providing the means for an in-depth analysis of an attack, hopefully leading to rapid
patch deployment and a reduction in the economic
impact of new threats.
Fault tolerance: With the ability to replay an execution, it
may also be possible to develop hot-standby systems for
critical service providers using commodity hardware. A
virtual machine (VM) could, for example, be fed, in real
time, the replay log of a primary server running on a physically separate machine. The standby VM could use the
replay log to mimic the primary’s execution, so that in the
event that the primary fails, the backup can take over
operation with almost zero downtime.
As existing commercial products have already shown,
deterministic replay can be achieved with a software-only
solution when executing in a uniprocessor environment.
18
This is due, in part, to the fact that sources of nondeterminism in a uniprocessor, such as interrupts or I/O, are relatively
rare events that take a long time to complete. However, when
executing in a shared-memory multiprocessor environment,
memory races, which can potentially occur on every memory
access, are another source of nondeterminism. All-software
solutions exist,
4, 8 but results show that they do not perform
well on workloads that interact frequently. Thus, it is likely
that a general solution will require hardware support. To
this end, Bacon and Goldstein2 originally proposed recording all snooping coherence transactions, which, while fast,
produced a serial and voluminous log (see Figure 1).
Xu et al.
16 modernized hardware support for multiprocessor deterministic replay in general and memory race recording in particular. A memory race recorder is responsible for
logging enough information to reconstruct the order of all
fine-grained memory interleavings that occur during an execution. To reduce the amount of information that needs to
be logged (so that longer periods can be recorded for a fixed
hardware cost), the system proposed by Xu et al. implemented
in hardware an enhancement to Netzer’s transitive reduction optimization.
13 The idea is to skip the logging of those
races that can be implied through transitivity, i.e., those races
The original Wisconsin Rerun6 paper as well as the original Illinois DeLorean11 paper were published in the
Proceedings of the 35th Annual International Symposium on
Computer Architecture (June 2008).