THE DESIGN OF data management systems has always
been influenced by the storage hardware landscape.
In the 1980s, database engines used a two-tier storage
hierarchy consisting of dynamic random access memory
(DRAM) and hard disk drives (HDD). Given the disparity
in cost between HDD and DRAM, it was important to
determine when it made economic sense to cache data
in DRAM as opposed to leaving it on the HDD.
In 1987, Jim Gray and Gianfranco Putzolu
established the five-minute rule that gave a precise
answer to this question: “1KB records referenced every
five minutes should be memory resi-
9 They arrived at this value by
using the then-current price-perfor-
mance characteristics of DRAM and
HDD shown in Table 1 for computing
the break-even interval at which the
cost of holding 1KB of data in DRAM
matches the cost of I/O to fetch it
Today, enterprise database engines
use a three-tier storage hierarchy as
depicted in Figure 1. DRAM or NAND
flash solid state device (SSD)-based performance tier is used for hosting data
accessed by latency-critical transaction
processing and real-time analytics applications. The HDD-based capacity
tier hosts data accessed by latency-insensitive batch analytics applications.
The archival tier is not used for online
query processing, but for storing data
that is only accessed rarely during regulatory compliance audits or disaster
recovery. This tier is primarily based on
tape and is extremely crucial as a long-term data repository for several application domains like physics, banking,
security, and law enforcement.
In this article, we revisit the five-
minute rule three decades after its in-
ception. We recomputed break-even
intervals for each tier of the modern,
multi-tiered storage hierarchy and use
guidelines provided by the five-minute
rule to identify impending changes in
the design of data management en-
gines for emerging storage hardware.
We summarize our findings here:
˲HDD is tape. The gap between
DRAM and HDD is increasing as the
five-minute rule valid for the DRAM–
HDD case in 1987 is now a four-hour
rule. This implies the HDD-based ca-
pacity tier is losing relevance for not
just performance sensitive applica-
tions, but for all applications with a
non-sequential data access pattern.
˲Non-volatile memory is DRAM.
The gap between DRAM and SSD is
shrinking. The original five-minute
rule is now valid for the DRAM–SSD
case, and the break-even interval is
less than a minute for newer non-vol-
atile memory (NVM) devices like 3D-
30 Years Later
and Its Impact
on the Storage
Tracing the evolution of the five-minute rule
to help identify imminent changes
in the design of data management engines.
BY RAJA APPUSWAMY, GOETZ GRAEFE,
RENATA BOROVICA-GAJIC, AND ANASTASIA AILAMAKI