whether the flash memory extends the buffer pool or the
disk. The central question is, therefore, not what to keep
in cache but how to manage flash memory contents and
its lifetime.
In database systems, flash memory can also be used for
recovery logs, because its short access times permit very
fast transaction commit. Limitations in write bandwidth
discourage such use, however. Perhaps systems with dual
logs can combine low latency and high bandwidth, one
log on a traditional disk and one log on an array of flash
chips.
O THER HARD WARE
In all cases, RAM is assumed to be a substantial size,
although probably less than flash memory or disk. The
relative sizes should be governed by the five-minute
rule. 11 Note that despite similar transfer bandwidth, the
short access latency of flash memory compared with disk
results in surprising retention times for data in RAM.
Finally, we assume sufficient processing bandwidth
as provided by modern many-core processors. Moreover,
we believe that forthcoming transactional memory (in
hardware and in the software runtime system) permits
highly concurrent maintenance of complex data structures. For example, page replacement heuristics might
use priority queues rather than bitmaps or linked lists.
Similarly, advanced lock management might benefit from
more complex data structures. Nonetheless, we do not
assume or require data structures more complex than
those already in common use for page replacement and
location tracking.
THE FIVE-MINUTE RULE
If flash memory is introduced as an intermediate level in
the memory hierarchy, relative sizing of memory levels
requires renewed consideration. Tuning can be based on
purchasing cost, total cost of ownership, power, mean
time to failure, mean time to data loss, or a combination of metrics. Following Gray and Putzolu, 12 this article
focuses on purchasing cost. Other metrics and appropriate formulas to determine relative sizes can be derived
similarly (e.g., by replacing dollar costs with energy use
for caching and moving data).
Gray and Putzolu introduced the following formula: 13, 14
BreakEvenIntervalinSeconds = (PagesPerMBofRAM /
AccessesPerSecondPerDisk) × (PricePerDiskDrive /
PricePerMBofRAM)
It is derived using formulas for the costs of RAM to hold
a page in the buffer pool and of a (fractional) disk to perform I/O every time a page is needed, equating these two
costs, and solving the equation for the interval between
accesses.
Assuming modern RAM, a disk drive using 4-KB pages,
and the values from tables 1 and 2, this produces:
(256 / 83) × ($80 / $0.047) = 5,248 seconds = 90 minutes =
1½ hours
(The “=” sign indicates rounding in this article.) This
compares with two minutes (for 4-KB pages) 20 years ago.
If there is a surprise in this change, it is that the
break-even interval has grown by less than two orders
of magnitude. Recall that RAM was estimated in 1987
at about $5,000 per megabyte, whereas the 2007 cost
is about $0.05 per megabyte, a difference of five orders
of magnitude. On the other hand, disk prices have also
tumbled ($15,000 per disk in 1987), and disk latency and
bandwidth have improved considerably (from 15 accesses
per second to about 100 on SATA and about 200 on high-performance SCSI disks).
For RAM and flash disks of 32 GB, the break-even
interval is
(256 / 6,200) × ($999 / $0.047) = 876 seconds = 15 minutes
If the 2007 price for flash disks includes a “novelty premium” and comes down closer to the price of raw flash
memory—say, to $400 (a price also anticipated by Gray
and Fitzgerald15), then the break-even interval is 351 seconds = 6 minutes.
An important consequence is that in systems tuned
using economic considerations, turn-over in RAM is
about 15 times faster (90 minutes / 6 minutes) if flash
memory rather than a traditional disk is the next level in
the storage hierarchy. Much less RAM is required, resulting in lower costs for purchase, power, and cooling.
Perhaps most interesting, applying the same formula
to flash and disk results in the following:
(256 / 83) × ($80 / $0.03) = 8,070 seconds = 2¼ hours
Thus, all active data will remain in RAM and flash
memory.
Without a doubt, two hours is longer than any common checkpoint interval, which implies that dirty pages
in flash are forced to disk not by page replacement but by
checkpoints. Pages that are updated frequently must be