Multiple secondary indexes for
a single table can be exploited into
index intersection, index joins, among
others. Fast access to individual
pages and records also benefits
those query execution plans. Like
secondary indexes, column stores or
more generally vertical partitioning
also require fetching records from
multiple places to assemble complete
rows. Thus, as seen in the example
of database query processing, using
flash memory in addition to or even as
replacement of traditional disks not
only forces reevaluation of optimal use
of the hardware but also means some
substantial software changes.
figure 5: Graceful degradation with multiple levels in the memory hierarchy.
Run retained in memory
sort output
Runs on disk
Runs on flash
Record and object Caches
Page sizes in database systems have
grown over the years, although not as
fast as disk-transfer bandwidth. On the
other hand, small pages require less
buffer-pool space for each root-to-leaf
search. For example, consider an index
with 20 million entries. With index pages of 128KB and 4,500 records, a root-to-leaf search requires two nodes and
thus 256KB in the buffer pool, although
half of that (the root node) can probably be shared with other transactions.
With 8KB index pages and 280 records
per page, a root-to-leaf search requires
three nodes or 24KB in the buffer pool,
or one order of magnitude less.
In traditional database architecture,
the default page size is a compromise
between efficient index search (using
large B-tree nodes as previously
discussed here and in the original B-tree
papers) and moderate buffer-pool
3
requirements for each index search.
Nonetheless, the previous example
requires 24KB in the buffer pool for
finding a record of perhaps only 20 bytes,
figure 4: Merging and partitioning files.
Partitioning
Merging
and it requires 8KB of the buffer pool for
retaining these 20 bytes in memory. An
alternative design uses large on-disk
pages and a record cache that serves
applications, because record caches
minimize memory needs yet provide
the desired data retention. In-memory
databases represent a specific form of
record caches when used as front ends
for traditional disk-based databases.
The introduction of flash memory
with its fast access latency and its small
optimal page size may render record
caches obsolete. With the large on-disk
pages in flash memory and only small
pages in the in-memory buffer pool, the
desired compromise can be achieved
without the need for two separate data
structures (such as, a transacted B-tree
and a separate record cache).
future Work
Several directions for future research
suggest themselves. First, while the
analyses in this article focused on
purchasing costs, a consideration of
other costs could capture the total cost
of ownership. A focus on energy consumption, for example, could lead to
different break-even points or even entirely different conclusions. Along with
CPU scheduling, algorithms for staging data in the memory hierarchy—
including buffer-pool replacement and
asynchronous I/O—may be the soft-
ware techniques with the highest impact on energy consumption. Note that
traditional database-query processing
relies on asynchronous I/O to reduce
response times; if the primary cost
metric for query processing is energy
consumption, asynchronous I/O has
no advantage over synchronous I/O.
Second, the five-minute rule applies
to permanent data and its management
in a buffer pool. The optimal retention
time for temporary data such as run
files in sorting and overflow files
in hash join and hash aggregation
may be different. For sorting, as for
B-tree searches, the goal should be to
maximize the number of comparisons
per unit of I/O time or per unit of energy
spent on I/O. Our initial research
and algorithm design has focused on
algorithms with graceful degradation
in sorting and for hybrid hash join
(that is, spilling memory contents to
flash only when and as much as truly
required, and similarly spilling flash
contents to disk only when and as
much as truly required). The different
optimal page sizes can be exploited to
achieve very high effective merge fan-in
and partitioning fan-out with relatively
little main memory. Figure 5 shows the
final merge step—very large runs on
disk use large pages that are buffered in
flash memory (shown as vertical boxes),
a few small runs have remained in flash