APRIL 2017 | VOL. 60 | NO. 4 | COMMUNICATIONS OF THE ACM 79
states d1 or d2. Using log_intact allows us to capture
all possible crash states concisely, including states that
can appear deep inside any procedure that atomic_two_
write might call (e.g., crashes inside log_commit).
3. 4. Recovery execution semantics
Crash conditions and address spaces allow us to specify the
possible states in which the computer might crash in the
middle of a procedure’s execution. We also need a way to
reason about recovery, including crashes during recovery.
For example, we want to argue that a transaction provides all-or-nothing atomicity: if atomic_two_write
crashes prior to invoking log_commit, none of the calls
to log_write will be applied; after log_commit returns,
all of them will be applied; and if atomic_two_write
crashes during log_commit, either all or none of them will
take effect. To achieve this property, the transaction system
must run log_recover after every crash to roll forward
any committed transaction, including after crashes during
log_recover itself.
The specification of the log_recover procedure is
shown in Figure 5. It says that, starting from any state
matching log_intact(last_state, committed_state),
log_recover will either roll back the transaction to last_
state or will roll forward a committed transaction to
committed_state. Furthermore, the fact that log_recover’s
crash condition implies the precondition indicates that
log_recover is idempotent, meaning that it can be safely
restarted after a crash to achieve the same postcondition.
(Strictly speaking, this sense of idempotence differs from
the mathematical notion, but follows conventions established in early work on fault-tolerant storage systems.
14)
To formalize the requirement that log_recover must
run after a crash, CHL provides a recovery execution semantics. The recovery semantics talks about two procedures executing (a normal procedure and a recovery procedure) and
producing either a failure, a completed state (after finishing
the normal procedure), or a recovered state (after finishing the
recovery procedure). This regime models the notion that the
normal procedure tries to execute and reach a completed
state, but if the system crashes, it starts running the recovery
procedure (perhaps multiple times if there are crashes during
recovery), which produces a recovered state.
Figure 6 shows how to extend the atomic_two_write
specification to include recovery execution using the notation. The postcondition indicates that, if atomic_two_
write finishes without crashing, both blocks were updated,
and if one or more crashes occurred, with log_recover
running after each crash, either both blocks were updated
to data within that file; and a directory is a logical address
space from file names to inode numbers. Building on this
observation, CHL generalizes the separation logic for rea-
soning about the disk to similarly reason about higher-level
address spaces like files, directories, or the logical disk con-
tents in a logging system.
As an example of address spaces, consider the specification of atomic_two_write, shown in Figure 4. Rather
than describe how atomic_two_write modifies the
on-disk data structures, the specification introduces new
address spaces, start_state and new_state, which correspond to the contents of the logical disk provided by the
logging system. Logical address spaces allow the developer
of the logging system to state a clean specification, which
provides the abstraction of a simple, synchronous disk to
higher layers in the file system. Developers of higher layers
can then largely ignore the details of the underlying asynchronous disk.
Specifically, in the precondition, a1 vx applies to the
address space representing the starting contents of the
logical disk, and in the postcondition, a1 v1 applies to
the new contents of the logical disk. Like the physical disk,
these address spaces are partial functions from addresses
to values (in this case, mapping 64-bit block numbers to
4 KB block values). Unlike the physical disk, the logical disk
address space provided by the logging system associates
a single value with each block, rather than a set of values,
because the transaction system exports a sound synchronous interface, proven correct on top of the asynchronous
interface below. Note how we use a variable others to stand
for untouched disk addresses in the logical disk, just as we
did for the physical disk in Figure 3.
Crucial to such a specification are explicit connections
between address spaces. In Figure 4, we use a predicate
log_rep, whose definition we elide here, but which captures how to map a higher-level state into a set of permissible lower-level states. For this example of a logging layer,
the predicate maps a “virtual” disk into a “physical” disk
that includes a log. Such predicates may take additional
arguments, as with the NoTxn argument that we use here
to indicate that the logging layer is in a quiescent state,
between transactions. This technique for connecting logical layers generalizes to stacks of several layers, as naturally
appear in a file system.
The crash condition of atomic_two_write, from Figure 4,
describes all of the states in which atomic_two_write
could crash using log_intact(d1, d2), which stands
for all possible log_rep states that recover to transaction
SPEC atomic_two_write(a1, v1, a2, v2)
PRE disk: log_rep(No Txn, start_state)
start_state: a1 ; vx ; a2 ; vy ; others
POST disk: log_rep(No Txn, new_state)
new_state: a1 ; v1 ; a2 ; v2 ; others
CRASH disk: log_intact(start_state, new_state)
Figure 4. Specification for atomic_two_write.
SPEC log_recover()
PRE disk: log_intact(last_state, committed_state)
POST disk: log_rep(No Txn, last_state) ∨
log_rep(No Txn, committed_state)
CRASH disk: log_intact(last_state, committed_state)
Figure 5. Specification of log_recover.