SUMMARY AND CONCLUSIONS
The 20-year-old five-minute rule for RAM and disks still
holds, but for ever-larger disk pages. Moreover, it should
be augmented by two new five-minute rules: one for
small pages moving between RAM and flash memory and
one for large pages moving between flash memory and
traditional disks. For small pages moving between RAM
and disk, Gray and Putzolu were amazingly accurate in
predicting a five-hour break-even point 20 years into the
future.
Research into flash memory and its place in system
architectures is urgent and important. Within a few years,
flash memory will be used to fill the gap between traditional RAM and traditional disk drives in many operating
systems, file systems, and database systems.
Flash memory can be used to extend RAM or persistent
storage. These models are called “extended buffer pool”
and “extended disk” here. Both models may seem viable
in operating systems, file systems, and database systems.
Because of the characteristics of these systems, however,
they will employ different usage models.
In both models, the contents of RAM and flash will
be governed by LRU-like replacement algorithms that
attempt to keep the most valuable pages in RAM and the
least valuable pages on traditional disks. The linked list or
other data structure implementing the replacement policy
for the flash memory will be maintained in RAM.
Operating systems and file systems will use flash
memory mostly as transient memory (e.g., as a fast
backup store for virtual memory and as a secondary file-system cache). Both of these applications fall into the
extended buffer-pool model. During an orderly system
shutdown, the flash memory contents might be written
to persistent storage. During a system crash, however, the
RAM-based description of flash-memory contents will
be lost and must be reconstructed by a contents analysis
very similar to a traditional file-system check. Alternatively, flash-memory contents can be voided and reloaded
on demand.
Database systems, on the other hand, will employ flash
memory as persistent storage, using the extended-disk
model. The current contents will be described in persistent
data structures (e.g., parent pages in B-tree indexes). Traditional durability mechanisms—in particular, logging and
checkpoints—ensure consistency and efficient recovery
after system crashes. An orderly system shutdown has no
need to write flash-memory contents to disk.
There are two reasons for these different usage models
for flash memory. First, database systems rely on regular
checkpoints during which dirty pages are flushed from
the buffer pool to persistent storage. Moving a dirty page
from RAM to the extended buffer pool in flash memory
creates substantial overhead during the next checkpoint.
A free buffer must be found in RAM, the page contents
must be read from flash memory into RAM, and then
the page must be written to disk. Adding such overhead
to checkpoints is not attractive in database systems with
frequent checkpoints. Operating systems and file systems,
on the other hand, do not rely on checkpoints and thus
can exploit flash memory as an extended buffer pool.
Second, the principal persistent data structures of databases, B-tree indexes, provide precisely the mapping and
location-tracking mechanisms needed to complement frequent page movement and replacement. Thus, tracking a
data page when it moves between disk and flash relies on
the same data structure maintained for efficient database
search. In addition to avoiding buffer descriptors, etc., for
pages in flash memory, avoiding indirection in locating a
page also makes database searches as efficient as possible.
Finally, as the ratio of access latencies and transfer
bandwidth is very different for flash memory and disks,
different B-tree node sizes are optimal. O’Neil’s SB-tree
exploits two nodes sizes as needed in a multilevel storage
hierarchy. The required inexpensive mechanisms for
moving individual pages are the same as those required
when moving pages between flash memory and disk. Q
ACKNOWLEDGMENTS
This article is dedicated to Jim Gray, who suggested this
research and helped me and many others many times
in many ways. Barb Peters, Lily Jow, Harumi Kuno, José
Blakeley, Mehul Shah, and the DaMoN 2007 reviewers
suggested multiple improvements after reading earlier
versions of this article.
REFERENCES
1. Gray, J., Putzolu, G.R. 1987. The 5-minute rule for
trading memory for disk accesses and the 10-byte rule
for trading memory for CPU time. SIGMOD Record
16( 3): 395-398.
2. Gray, J., Graefe, G. 1997. The five-minute rule ten
years later, and other computer storage rules of
thumb. SIGMOD Record 26( 4): 63-68.
3. Larus, J.R., Rajwar, R. 2007. Transactional Memory.
Synthesis Lectures on Computer Architecture. Morgan
and Claypool.
4. Hamilton, J. 2007. An architecture for modular data
centers. CIDR (Conference on Innovative Data Systems Research).