RAID- 5 today. It is again time to create
a new RAID level to accommodate the
realities of disk reliability, capacity,
and throughput merely to maintain
that same level of data protection.
Triple-Parity RAiD
With RAID- 6 increasingly unable to
meet reliability requirements, there
is an impending but not yet urgent
need for triple-parity RAID. The addition of another level of parity mitigates increasing RAID rebuild times
and occurrences of latent data errors.
As shown in Figure 7, triple-parity
RAID will address the shortcomings
of RAID- 6 for years (see the accompanying sidebar “A Classification for
Triple-Parity RAID”). The reliability
is largely independent of the specific
implementation of triple-parity RAID;
a general Reed-Solomon method suffices for our analysis.
A recurring theme in computer science is that algorithms can be specialized for small fixed values, but are then
generalized to scale to an arbitrary
value. A common belief in the computer industry had been that double-parity RAID was effectively that generalization, that it provided all the data
reliability that would ever be needed.
RAID- 6 is inadequate, leading to the
need for triple-parity RAID, but that,
too, if current trends persist, will become insufficient. Not only is there a
need for triple-parity RAID, but there’s
also a need for efficient algorithms
that truly address the general case of
RAID with an arbitrary number of parity devices.
Beyond RAID- 5 and - 6, what are the
implications for RAID- 1, simple two-way mirroring? RAID- 1 can be viewed
as a degenerate form of RAID- 5, so
even if bit error rates improve at the
same rate as hard-drive capacities,
the time to repair for RAID- 1 could become debilitating. How secure would
an administrator be running without
redundancy for a week-long scrub?
For the same reasons that make triple-parity RAID necessary where RAID- 6
had sufficed, three-way mirroring will
displace two-way mirroring for applications that require both high performance and strong data reliability. Indeed, four-way mirroring may not be
far off, since even three-way mirroring
is effectively a degenerate, but more
reliable, form of RAID- 6, and will be
susceptible to the same failings.
implications for RAiD
While triple-parity RAID will be necessary, the steady penetration of flash
solid-state storage could have a significant effect on the fate of disk drives.
At one extreme, some have predicted
the relegation of disk to a tape-like
backup role as flash becomes cheap
and reliable enough to act as a replacement for disk.
6 In that scenario,
RAID is still necessary as even solid-state devices suffer catastrophic
and partial failures, but the specific
capacities, error rates, and throughputs for such devices could mean that
triple-parity RAID is not required. Unfortunately, too little is known about
the properties of devices that might
flourish, and that scenario is too far
in the future to obviate the need for
triple-parity RAID.
At another extreme, the integration of flash into the storage hierarchy8 could address high-performance
needs though solid-state caching and
buffering, thus decoupling system
performance from that of the component hard drives. This could hasten
current trends as hard-drive manufacturers would be able to increase capacity even more quickly, unhindered
by performance requirements, while
likely slowing the rate of throughput
increases. Further, divorced from performance, RAID stripes could grow
very wide to optimize for absolute
capacity; this would reduce the reliability further with the same amount
of parity protecting more data. In this
scenario, the need for triple-parity
RAID would be made all the more urgent by accelerating current trends.
If Kryder’s Law continues to hold,
the burden of correctness will increasingly shift from the hard-drive
manufacturers to the RAID systems
that integrate them. Today, RAID reconstruction times factor more into
reliability calculations than ever before, and their contribution will increasingly dominate. Triple-parity
RAID will soon be critical to provide
sufficient reliability even in the face
of exponential growth.
Acknowledgments
Many thanks to Dominic Kay for gath-
ering the historical hard-drive data,
and to Matt Ahrens, Daniel Leventhal,
and Beverly Hodgson for their helpful
reviews.
Related articles
on queue.acm.org
Flash Storage Today
Adam Leventhal
http://queue.acm.org/detail.cfm?id=1413262
hard Disk Drives: The Good,
the Bad and the Ugly
Jon Elerath
http://queue.acm.org/detail.cfm?id=1317403
You Don’t Know Jack about Disks
Dave Anderson
http://queue.acm.org/detail.cfm?id=864058
References
1. berriman, e., Feresten, P., and Kung, s. netapp
raId-dP: dual-parity raid- 6 protection without
compromise; http://www.mochadata.com/download/
netapp-raid-dp.pdf (2006).
2. blaum, M., brady, J., bruck, J., and Menon, J.
eVenodd: an optimal scheme for tolerating double
disk failures in raId architectures. In Proceedings
of the International Symposium on Computer
Architecture (1994), 245–254; http://portal.acm.org/
citation.cfm?id=191995.192033.
3. Chen, P., lee, e., Patterson, d., Gibson, G., and Katz, r.
raId: High-performance, reliable secondary storage.
technical report Csd 93-778 (1993); http://portal.
acm.org/citation.cfm?id=893811.
4. Corbett, P., english, b., Goel, a., Grcanac, t., Kleiman,
s., leong, J., and sankar, s. row-diagonal parity
for double disk failure correction. In Proceedings
of the 3rd Usenix Conference on File and Storage
Technologies (2004), 1–14; http://portal.acm.org/
citation.cfm?id=1096673.1096677.
5. elerath, J. Hard-disk drives: the good, the
bad, and the ugly. Commun. ACM 52, 6 (June
2009), 38–45; http://portal.acm.org/citation.
cfm?id=1516046.1516059.
6. Gray, J. and Fitzgerald, b. Flash disk opportunity
for server-applications. Microsoft research; http://
research.microsoft.com/en-us/um/people/gray/
papers/ FlashdiskPublic.doc (2007).
7. Hitz, d. 2006. Why “double Protecting raId” (
raId-dP) doesn’t waste extra disk space; http://blogs.
netapp.com/dave/2006/05/why_double_prot.html
(2006).
8. leventhal, a. Flash storage memory. Commun.
ACM 51, 7 (July 2008), 47–51; http://portal.acm.org/
citation.cfm?id=1364782.
9. Patterson, d., Gibson, G., and Katz, r. a case for
redundant arrays of inexpensive disks (raId).
In Proceedings of ACM SIGMOD International
Conference on Management of Data (1988), 109–116;
http://portal.acm.org/citation.cfm?id=50214.
10. Plank, J. a tutorial on reed-solomon coding for
fault-tolerance in raId-like systems. technical
report ut-Cs-96-332; http://portal.acm.org/citation.
cfm?id=898928 (1996).
11. Walter, C. Kryder’s law. Scientific American (aug.
2005); http://www.scientificamerican.com/article.
cfm?id=kryders-law.
Adam Leventhal is a senior staff engineer and flash
architect for sun’s Fishworks advanced product
development team responsible for the sun storage 7000
series. He is one of the three authors of dtrace, for which
he and his colleagues were named one of Info World’s
Innovators of 2005 and won top honors from the 2006
Wall Street Journal’s Innovation awards.