answered in order for the cold storage
tier to be feasible in practice.
Over the past few years, several other
systems have been built to reduce the
cost of storing cold data using alternative storage media. For instance, DT-Store16 uses LTFS tape archive for reducing the TCO of online multimedia
streaming services by storing cold data
in tape drives. ROS28 is a PB-sized, rack-scale cold storage library built using
thousands of optical discs packed in
a single 42U Rack. Today, it is unclear
as to how these alternative storage options fare with respect to HDD-based
CSD as the storage media of choice for
storing cold data. Furthermore, in order for the Cold Storage Tier to be realized in practice, an ideal cold storage
media needs to support batch analytics
workloads. CSD, tape, and optical media are all primarily used today for archival storage where data is rarely read.
Further research is required to understand the reliability implications of using these storage devices under batch
Finally, with widespread adoption
of cloud computing, the modern enterprise storage hierarchy not only spans
several storage devices, but also different geographic locations from direct-attached low-latency devices, through
network-attached storage servers, to
cloud-hosted storage services. The
price-performance characteristics of
these storage configurations vary dramatically depending not only on the
storage media used, but also on other
factors like the total capacity of data
stored, the frequency and granularity of I/O operations used to access the
data, the read–write ratio, the duration
of data storage, and the cloud service
provider used, to name a few. Given the
multitude of factors, determining the
break-even interval for cloud storage is
a complicated problem that we did not
consider in this work. Thus, another
interesting avenue of future work is extending the five-minute rule to such a
distributed cloud storage setting.
1. Appuswamy, R., Olma, M., and Ailamaki, A. Scaling
the memory power wall with dram-aware data
management. In Proceedings of DaMoN, 2015.
2. Balakrishnan, S. et al. Pelican: A building block
for exascale cold data storage. In Proceedings of
3. Borovica-Gajic, R., Appuswamy, R., and Ailamaki, A.
Cheap data analytics using cold storage devices. In
Proceedings of VLDB 9, 12 (2016).
4. Colarelli, D. and Grunwald, D. Massive arrays of idle
disks for storage archives. In Proceedings of 2002
Conference on Supercomputing.
5. Coughlin, T. Flash memory areal densities exceed
those of hard drives; http://bit.ly/2NbDh5T.
6. Graefe, G. The five-minute rule 20 years later (and
how flash memory changes the rules). Commun. ACM
52, 7 (July 2009).
7. Gray, J. The five-minute rule; research.microsoft.com/
8. Gray, J. and Graefe, G. The five-minute rule ten years
later, and other computer storage rules of thumb.
SIGMOD Rec. 26, 4 (1997).
9. Gray, J. and Putzolu, F. The 5-minute rule for trading
memory for disc accesses and the 10-byte rule for
trading memory for CPU time. In Proceedings of
10. Horison Information Strategies Report. Tiered storage
takes center stage, IDC. Technology assessment:
Cold storage is hot again — finding the frost point;
11. Intel. Cold Storage in the Cloud: Trends, Challenges,
and Solutions, 2013; https://intel.ly/2ZG74F6.
12. I.S.I. Consortium. International magnetic
tape storage roadmap; http://www.insic.org/
13. Kathpal, A. and Yasa, G.A.N. Nakshatra: Towards
running batch analytics on an archive. In Proceedings
of MASCO TS, 2014.
14. Lantz, M. Why the future of data storage is (still)
magnetic tape; http://bit.ly/2XChrMO
15. Lee, J., Ahn, J., Park, C., and Kim, J. Dtstorage:
Dynamic tape-based storage for cost-effective and
highly-available streaming service. In Proceedings of
16. Lim, K., Chang, J., Mudge, T., Ranganathan, P.,
Reinhardt, S.K., and Wenisch, T.F. Disaggregated
memory for expansion and sharing in blade servers. In
Proceedings of ISCA, 2009.
17. Lomet, D. Cost/performance in modern data stores:
How data caching systems succeed. In Proceedings of
18. Moore, F. Storage outlook 2016; http://bit.ly/2KBLgao.
19. Perlmutter, M. The lost picture show: Hollywood
archivists cannot outpace obsolescence, 2017; http://
20. Spectra. Arcticblue deep storage disk. Product, https://
21. SpectraLogic. Spectralogic t50e; http://bit.ly/2Ych8pl.
22. StorageReview. Intel optane memory review. http://
23. TPC-C. Dell-microsoft sql server tpc-c executive
summary, 2014; http://www.tpc.org/tpcc/results/
24. Ultrium. LTP ultrium roadmap;
26. Umamageswaran, K. and Goindi, G. Exadata:
Delivering memory performance with shared flash;
27. Yan, M. Open compute project: Cold storage hardware
v0.5, 2013; http://bit.ly/2X6H2Ot.
28. Yan, W., Yao, J., Cao, Q., Xie, C., and Jiang, H. Ros:
A rack-based optical storage system with inline
accessibility for long-term data preservation. In
Proceedings of EUROSYS, 2017.
Raja Appuswamy ( email@example.com) is an
assistant professor in the Data Science Department at
EURECOM, Biot, Provence-Alpes-Côte d’Azur, France.
Goetz Graefe ( firstname.lastname@example.org), Google, Inc.,
Madison, WI, USA.
Renata Borovica-Gajic ( email@example.com.
au) is an assistant professor in the School of Computing
and Information Systems at the University of Melbourne,
Anastasia Ailamaki ( firstname.lastname@example.org) is a
professor at EPFL, Lausanne, Switzerland, and director of
its Data-Intensive Applications and Systems (DIAS) lab.
© 2019 ACM 0001-0782/19/11
ry execution over CSD.
3 Skipper even
shows that for long-running batch queries, using CSD results in query execution time increasing by only 35% compared to a traditional HDD despite the
long disk spin-up latency. With such
frameworks, it should be possible for
installations to switch from the traditional three-tier hierarchy to a two-tier
hierarchy consisting of just a performance tier with DRAM and SSDs, and
a cold storage tier with CSDs.
Conclusion and Future Work
Modern database engines use a three-tier storage hierarchy across four
primary storage media (DRAM, SSD,
HDD, and tape) with widely varying
price-performance characteristics. In
this article, we revisited the five-minute
rule in the context of this modern storage hierarchy and used it to highlight
impending changes based on recent
trends in the hardware landscape.
In the performance tier, NAND flash
is inching its way closer to the CPU resulting in dramatic improvements in
both access latency and bandwidth. For
state-of-the-art PCIe SSDs, the break-even interval predicted by the five-minute rule is one minute for 4KB pages.
Going forward, further improvements
in NAND flash and the introduction of
new NVM technologies will likely result
in this interval dropping further. As the
data reuse window shrinks, it will soon
be economically more valuable to store
most, if not all, data on solid-state storage devices instead of DRAM. This will
invariably necessitate revisiting several
techniques pioneered by traditional
HDD-based database engines, but eschewed by in-memory engines, like
buffer caching, on-disk storage layout,
and index persistence, to name a few,
for these new low-latency, high-bandwidth storage devices.
Traditionally, HDDs have been
used for implementing the capacity tier. However, our analysis showed
that the difference between HDD and
tape is shrinking when $/TBScan is
used as the metric. Given the latency-insensitive nature of batch analytics
workloads, it is economically beneficial to merge the HDD-based capacity
tier and the tape-based archival tier
into a single cold storage tier as demonstrated by recent research.
3 However, several open questions still need to