gest some substantial differences in the management of
B-tree pages and their allocation. Beyond optimization of
page sizes, B-trees can use different units of I/O for flash
memory and disks. Presenting the case for this design is
the third purpose of this article.
exploit the same methods as a traditional buffer pool.
For truly comparable and competitive performance and
administration costs, a similar approach seems advisable
when flash memory is used as an extended disk.
ASSUMP TIONS
Forward-looking research always relies on many assumptions. This section lists the assumptions that led to the
conclusions put forth in this article. Some of the assumptions are fairly basic, whereas others are more speculative.
One assumption is that file systems and database
systems assign flash memory to a level between RAM and
the disk drives. Both software systems favor pages with
some probability that they will be touched in the future
but not with sufficient probability to warrant keeping
them in RAM. The estimation and administration of such
probabilities follow the usual lines (e.g., LRU, or least
recently used).
We assume that the administration of such information employs data structures in RAM, even for pages
whose contents have been removed from RAM to flash
memory. For example, the LRU chain in a file system’s
buffer pool might cover both RAM and flash memory, or
there might be two separate LRU chains. A page is loaded
into RAM and inserted at the head of the first chain when
it is needed by an application. When it reaches the tail of
the first chain, the page is moved to flash memory and
its descriptor to the head of the second LRU chain. When
it reaches the tail of the second chain, the page is moved
to disk and removed from the LRU chain. Other replacement algorithms would work mutatis mutandis.
Such fine-grained LRU replacement of individual pages
is in contrast to assigning
entire files, directories,
tables, or databases to
different storage units. It
seems that page replacement is the appropriate granularity in buffer
pools. Moreover, proven
methods exist to load
and replace buffer-pool
contents entirely automatically, without assistance
from tuning tools and
without directives by
users or administrators.
An extended buffer pool
in flash memory should
FILE S YS TEMS
The research for this article assumed a fairly traditional
file system. Many file systems differ in one way or
another from this model, but most still generally adhere
to it.
Each file is a large byte stream. Files are often read in
their entirety, their contents manipulated in memory, and
the entire file replaced if it is updated at all. Archiving,
version retention, hierarchical storage management, data
movement using removable media, etc. all seem to follow
this model as well.
Based on this model, space allocation on disk attempts
to use contiguous disk blocks for each file. Metadata is
limited to directories, a few standard tags such as a creation time, and data structures for space management.
Consistency of these on-disk data structures is
achieved by careful write ordering, fairly quick write-back
of updated data blocks, and expensive file-system checks
after any less-than-perfect shutdown or media removal. In
other words, we assume the absence of transactional guarantees and transactional logging, at least for file contents.
If log-based recovery is supported for file contents such as
individual pages or records within pages, then a number
of the arguments presented here need to be revisited.
DA TABASE S YS TEMS
We assume fairly traditional database systems with B-tree
indexes as the workhorse storage structure. Similar tree
Relative Costs for Flash Memory and Disks
2
TABLE
Price and capacity $999 for 32 GB $80 for 250 GB
Price per GB $31.20 $0.32
Time to read a 4-KB page 0.16 ms 12.01 ms
4-KB reads per second 6,200 83
Price per 4-KB read per second $0.16 $0.96
Time to read a 256-KB page 3.98 ms 12.85 ms
256-KB reads per second 250 78
Price per 256-KB read per second $3.99 $1.03
(Derived metrics from http://www.dramexchange.com, http://www.dvnation.com, http://www.buy.com,
http://www.seagate.com, and http://www.samsung.com; all 4/11/2007)