enabled HDDs to increase capacity at
Kryder’s rate (40% per year), outstripping Moore’s Law. However, over the
past few years, HDD vendors have hit
walls in scaling areal density with conventional Perpendicular Magnetic Recording (PMR) techniques resulting in
annual areal density improvement of
only around 16% instead of 40%.
19
HDDs also present another problem when used as the storage medium
of choice for building a capacity tier,
namely, high idle power consumption. Although enterprises gather vast
amounts of data, as one might expect,
not all data is accessed frequently. Recent studies estimate that as much as
80% of enterprise data is “cold,” meaning infrequently accessed, and that
cold data is the largest growing segment with a 60% Cumulative Annual
Growth Rate (CAGR).
10–12 Unlike tape,
which consumes no power once unmounted, HDDs consume a substantial amount of power even while idle.
Such power consumption translates to
a proportional increase in TCO.
Tape. The areal density of tape has
been increasing steadily at a rate of
33% per year and roadmaps from the
Linear Tape Open consortium (LTO)
25
and the Information Storage Industry
Consortium (INSIC)
13 project a continued increase in density for the foreseeable future.
Table 5 shows the price/performance metrics of tape storage both in
1997 and today. The 1997 values are
based on the corresponding five-minute rule paper.
8 The 2018 values are
based on a SpectraLogic T50e tape library22 using LTO- 7 tape cartridges.
With individual tape capacity increasing 200× since 1997, the total capacity stored in tape libraries has expanded from hundreds of gigabytes to
hundreds of petabytes today. Further,
a single LTO- 7 cartridge is capable of
matching, or even outperforming a
HDD, with respect to sequential data
access bandwidth as shown in Table
6. As modern tape libraries use multiple drives, the cumulative bandwidth
achievable using even low-end tape libraries is 1–2GB/s. High-end libraries
can deliver well over 40GB/s. These
benefits have made tape the preferable media of choice in the archival
tier both on-premise and in the cloud,
for several applications ranging from
natural sciences, like particle physics
and astronomy, to movies archives in
the entertainment industry.
15, 20
However, random access latency of tape is
still 1000× higher than HDD (minutes
vs. ms) due to the fact that tape libraries need to mechanically load and
wind tape cartridges before data can
be accessed.
Break-even interval and implica-
tions. Using metrics from Tables 1, 5
to compute the break-even interval for
the DRAM–tape case results in an in-
terval of over 300 years for a page size
of 4KB! Jim Gray referred to tape drives
as the “data motel” where data checks
in and never checks out,
7 and this is
certainly true today. Figure 2 shows
the variation in break-even interval
for both HDD and tape for various
page sizes. We see that the interval
asymptotically approaches one min-
ute in the DRAM–HDD case and 10
minutes in the DRAM–tape case. The
HDD asymptote is reached at a page
size of 100MB and the tape asymp-
tote is reached at a size of 100GB. This
clearly shows that randomly access-
ing data on these devices is extremely
expensive, and data transfer sizes
with these devices should be large to
consumption in main memory data-
bases showed that in a server equipped
with 6TB of memory, the idle power
of DRAM would match that of four ac-
tive CPUs.
1 Such a difference in power
consumption between SSD and DRAM
directly translates into higher Opera-
tional Expenses (OPEX), and hence,
higher Total Cost of Ownership (TCO),
for DRAM-based database engines.
Given these three factors, the break-even interval from the five-minute rule
seems to suggest an inevitable shift
from DRAM-based data management
engines to NVM-based persistent-mem-ory engines. In fact, this change is already well under way, as state-of- the-art
database engines are being updated to
fully exploit the performance benefits
of PCIe NVMe SSDs.
26 Researchers have
recently highlighted the fact that data
caching systems that trade-off performance for price by reducing the amount
of DRAM are gaining market share over
in-memory database engines.
18
The Capacity Tier
HDD. Traditionally, HDDs have been
the primary storage media used for
provisioning the capacity tier. For several years, areal density improvements
Table 5. Price/performance characteristics of tape.
1997 2018
Tape library cost ($) 10,000 11,000
Number of drives 1 4
Number of slots 14 10
Max capacity per tape 35GB 15TB
Transfer rate per drive (MB/s) 5 750
Access latency 30s 65s
Table 6. Price/performance metrics of DRAM, HDD, and tape.
Metric DRAM HDD Tape
Unit capacity 16GB 2TB 10 × 15TB
Unit cost ($) 80 50 11,000
Latency 100ns 5ms 65s
Bandwidth 100 GB/s 200 MB/s 4 × 750MB/s
Kaps 9,000,000 200 0.02
Maps 10,000 100 0.02
Scan time 0.16s 3hours 14hours
$/Kaps 9e- 14 5e-09 8e-03
$/Maps 9e- 12 8e-09 8e-03
$/Tbscan 8e-06 0.003 0.03
$/TBscan( 97) 0.32 4. 23 296