The Five-minute Rule
20 Years Later
and How Flash Memory Changes the Rules
caches obsolete. With the large on-disk pages in flash
memory and only small pages in the in-memory buffer
pool, the desired compromise can be achieved without
the need for two separate data structures (i.e., a transacted
B-tree and a separate record cache).
In object-oriented applications that assemble complex
objects from many tables and indexes in a relational
database, the problem may be either better or worse,
depending on the B-tree technology. If traditional indexes
are used with a separate B-tree for each record format,
assembling a complex object in memory requires many
root-to-leaf searches and, therefore, many B-tree nodes in
the buffer pool. If records from multiple indexes can be
interleaved within a single B-tree based on their common
search keys and sort order36, 37 (e.g., on object identifier plus appropriate additional keys), very few or even
a single B-tree search may suffice. Moreover, the entire
complex object may be retained in a single page within
the buffer pool.
DIREC TIONS FOR FU TURE WORK
Several directions for future research suggest themselves.
First, the analyses presented in this article are focused on
purchasing costs. Other costs could also be taken into
consideration to capture total cost of ownership. Perhaps
most interestingly, a focus on energy consumption may
lead to different break-even points or even entirely different conclusions. Along with CPU scheduling, algorithms
for staging data in the memory hierarchy, including
buffer-pool replacement and compression, may be the
software techniques with the highest impact on energy
consumption.
Second, the five-minute rule applies to permanent
data and its management in a buffer pool. The optimal
retention time for temporary data such as run files in sorting and overflow files in hash join and hash aggregation
may be different. For sorting, as for B-tree searches, the
goal should be to maximize the number of comparisons
per unit of I/O time or per unit of energy spent on I/O.
Focused research may lead to new insights about query
processing in multilevel memory hierarchies.
Third, Gray and Putzolu offered further rules of
thumb, such as the 10-byte rule for trading memory and
CPU power. These rules also warrant revisiting for both
costs and energy. Compared with 1987, the most fundamental change may be that CPU power should be measured not in instructions but in cache line replacements.
Trading off space and time seems like a new problem in
this environment.
Fourth, what are the best data-movement policies?
One extreme is a database administrator explicitly moving entire files, tables, and indexes between flash memory
and traditional disks. Another extreme is automatic
movement of individual pages, controlled by a replacement policy such as LRU. Intermediate policies may
focus on the roles of individual pages within a database
or on the current query processing activity. For example,
catalog pages may be moved after schema changes to
facilitate fast recompilation of all cached query execution plans, and upper B-tree levels may be prefetched and
cached in RAM or in flash memory during execution of
query plans relying on index-to-index navigation.
Fifth, what are the secondary effects of introducing
flash memory into the memory hierarchy of a database
server? For example, short access times permit a lower
multiprogramming level, because only short I/O operations must be “hidden” by asynchronous I/O and context
switching. A lower multiprogramming level in turn may
reduce contention for memory in sort and hash operations and for locks (concurrency control for database contents) and latches (concurrency control for in-memory
data structures). Should this effect prove significant, the
effort and complexity of using a fine granularity of locking may be reduced.
Sixth, how will flash memory affect in-memory database systems? Will they become more scalable, affordable,
and popular based on memory inexpensively extended
with flash memory rather than RAM? Will they become
less popular because very fast traditional database systems
use flash memory instead of (or in addition to) disks? Can
a traditional code base using flash memory instead of
traditional disks compete with a specialized in-memory
database system in terms of performance, total cost of
ownership, development and maintenance costs, time to
market of features and releases, etc.?
Finally, techniques similar to generational garbage collection may benefit storage hierarchies. Selective reclamation applies not only to unreachable in-memory objects
but also to buffer-pool pages and favored locations on
permanent storage. Such research may also provide guidance for log-structured file systems, wear leveling for flash
memory, and write-optimized B-trees on RAID storage.