APRIL 2017 | VOL. 60 | NO. 4 | COMMUNICATIONS OF THE ACM 83
of code in the CHL infrastructure and changing over half
of the implementations and proofs for the write-ahead log.
However, layers above the log did not require any changes,
since the log provided the same synchronous disk abstraction in both cases.
Buffer cache. We added a buffer cache to FSCQ after
we had already built the write-ahead log and several layers above it. Since Coq is a pure functional language,
keeping buffer-cache state required passing the current
buffer-cache object to and from all functions. Incorporating
the buffer cache required changing about 300 lines of
code and proof in the log, to pass around the buffer-cache state, to access disk via the buffer cache and to
reason about disk state in terms of buffer-cache invariants.
We also had to make similar straightforward changes to
about 600 lines of code and proof for components above
the log.
Optimizing log layout. The initial design of FSCQ’s write-ahead log used one disk block to store the length of the on-disk log and another block to store a commit bit, indicating
whether log recovery should replay the log contents after a
crash. Once we introduced asynchronous writes, storing
these fields separately necessitated an additional disk sync
between writing the length field and writing the commit bit.
To avoid this sync, we modified the logging protocol slightly:
the length field was now also the commit bit, and the log is
applied on recovery if the length is nonzero. Implementing
this change required modifying about 50 lines of code and
about 100 lines of proof.
5. 5. Evaluation summary
Although FSCQ is not as complete and high-performance as
today’s high-end file systems, our results demonstrate that
this is largely due to FSCQ’s simple design. Furthermore,
the case studies in Section 5. 4 indicate that one can improve
FSCQ incrementally. In future work, we hope to improve
FSCQ’s logging to batch transactions and to log only metadata; we expect this will bring FSCQ’s performance closer
to that of ext4’s logging. The one exception to incremental
improvement is multiprocessor support, which may require
global changes and is an interesting direction for future
research.
6. CONCLUSION
This paper’s contributions are CHL and a case study of
applying CHL to build FSCQ, the first certified crash-safe
file system. CHL allowed us to concisely and precisely
specify the expected behavior of FSCQ. Via this verification
approach, we arrive at a machine-checked proof that FSCQ
avoids bugs that have a long history of causing data loss in
previous file systems. For this kind of critical infrastructure,
the cost of proving seems a reasonable price to pay. We hope
that others will find CHL useful for constructing crash-safe
storage systems.
Acknowledgments
The authors would like to thank Nathan Beckmann,
Butler Lampson, Robert Morris, and the IronClad team
for insightful discussions and feedback. The author would
Multiprocessor bugs (#8) are out of scope for our FSCQ
prototype, because it does not support multithreading.
5. 3. Crash recovery
We proved that FSCQ implements its specification, but
in principle it is possible that the specification is incomplete or that our unproven code (e.g., the FUSE driver) has
bugs. To do an end-to-end check, we ran two experiments.
First, we ran fsstress from the Linux Test Project, which
issues random file-system operations; this did not uncover
any problems. Second, we experimentally induced crashes
and verified that each resulting disk image after recovery
is consistent.
In particular, we created an empty file system using
mkfs, mounted it using FSCQ’s FUSE interface, and then
ran a workload on the file system. The workload creates two
files, writes data to the files, creates a directory and a subdirectory under it, moves a file into each directory, moves
the subdirectory to the root directory, appends more data
to one of the files, and then deletes the other file. During
the workload, we recorded all disk writes and syncs. After
the workload completed, we unmounted the file system and
constructed all possible crash states. We did this by taking
a prefix of the writes up to some sync, combined with every
possible subset of writes from that sync to the next sync. For
the workload described above, this produced 320 distinct
crash states.
For each crash state, we remounted the file system (which
runs the recovery procedure) and then ran a script to examine the state of the file system, looking at directory structure,
file contents, and the number of free blocks and inodes. For
the above workload, this generates just 10 distinct logical
states (i.e., distinct outputs from the examination script).
Based on a manual inspection of each of these states, we
conclude that all of them are consistent with what a POSIX
application should expect. This suggests that FSCQ’s specifications, as well as the unverified components, are likely to
be correct.
5. 4. Development effort
The final question is, how much effort is involved in
developing FSCQ? One metric is the size of the FSCQ code
base, reported in Figure 8; FSCQ consists of about 30,000
lines of code. In comparison, the xv6 file system is about
3000 lines of C code, so FSCQ is about 10× larger, but
this includes a significant amount of CHL infrastructure,
including libraries and proof machinery, which is not
FSCQ-specific.
A more interesting question is how much effort is required
to modify FSCQ, after an initial version has been developed
and certified. Does adding a new feature to FSCQ require
reproving everything, or is the work commensurate with the
scale of the modifications required to support the new feature? To answer this question, the rest of this section presents
several case studies, where we had to add a significant feature
to FSCQ after the initial design was already complete.
Asynchronous disk writes. We initially developed FSCQ to
operate with synchronous disk writes. Implementing asynchronous disk writes required changing about 1000 lines