The Five-minute Rule
20 Years Later
and How Flash Memory Changes the Rules
any dirty data from the buffer pool. Instead, they rely on
carefully writing modified file-system pages because of the
lack of a recovery log protecting file contents.
replacement (i.e., when a page’s usage suggests placement
on disk rather than in flash memory). Thus, checkpoints
do not incur the cost of moving data from flash memory
to disk.
Checkpoints might even be faster in systems with flash
memory because dirty pages in RAM need to be written
merely to flash memory, not to disk. Given the very fast
random access in flash memory relative to disk drives,
this difference might speed up checkpoints significantly.
To summarize, database systems benefit if the flash
memory is treated as part of the system’s persistent
storage. In contrast, traditional file systems do not have
system-wide checkpoints that flush the recovery log and
TABLE
4Page Utility for B-tree Nodes on Disk
4 KB
16 KB
64 KB
128 KB
256 KB
512 KB
1 MB
140
560
2,240
4,480
8,960
17,920
35,840
7
9
11
12
13
14
15
12.0ms
12.1ms
12.2ms
12.4ms
12.9ms
13.7ms
15.4ms
Page Utility for B-tree Nodes on Flash Memory
1 KB
2 KB
4 KB
8 KB
16 KB
64 KB
35
70
140
280
560
2,240
5
6
7
8
9
11
PAGE SIZES
In addition to tuning based on the five-minute rule,
another optimization based on access performance is sizing of B-tree nodes. The optimal page size minimizes the
time spent on I/O during a root-to-leaf search. It balances
a short access time (i.e., the desire for small pages) with a
high reduction in remaining search space (i.e., the desire
for large pages).
Assuming binary search within each B-tree node, the
reduction in remaining search space is measured by the
logarithm of the number of records within each node.
This measure was called a node’s utility in our earlier
work. 23 This optimization is essentially equivalent to one
described in the original research on B-trees. 24
Table 4 illustrates this optimization for records of 20
bytes—typical with prefix and suffix truncation25—and
nodes filled at about 70 percent. Not surprisingly, the
optimal node size for B-tree
indexes on modern high-
bandwidth disks is much
larger than the page sizes
employed in traditional
database systems. With
those disks, the access time
0.58 dominates for all small
0.75 page sizes, such that addi-
0.90 tional byte transfer and,
0.97 therefore, additional utility
1.01 are almost free.
1.02 B-tree nodes of 256 KB
0.97 are very near optimal. For
those, table 3 indicates a
break-even time for RAM
and disk of 88 seconds.
5For a $400 flash disk and
a traditional rotating hard
disk, table 3 indicates 337
seconds or just over five
minutes. This is the first of
43. 4 the two new five-minute
46. 1 rules presented here.
43. 6 Table 5 illustrates the
same calculations for
36. 2
TABLE
B-trees on flash memory.
26. 3
Because of the lack of
10. 3 mechanical seeking and
0.11ms
0.13ms
0.16ms
0.22ms
0.34ms
1.07ms