DOi: 10.1145/1516046.1516059
such as mirroring, RAID- 4 and RAID-
Article development led by
queue.acm.org
5, and the n+ 2 configuration, RAID- 6,
which increases storage system reliability using two redundant disks (dual parity). Additionally, reliability at the RAID group level has been favorably enhanced because HDD reliability has been improving as well.
Several manufactures produce one-
terabyte HDDs and higher capacities are being designed. With higher areal hard-Disk lower fly-heights (the distance between the head and the disk media), and per-densities (also known as bit densities),
pendicular magnetic recording technology, can HDD reliability continue to Drives: the improve? The new technology required to achieve these capacities is not without concern. Are the failure mechanisms or the probability of failure any
different from predecessors? Not only are there new issues to address stemming from the new technologies, but also failure mechanisms and modes and the ugly vary by manufacturer, capacity, interface, and production lot.
How will these new failure modes affect system designs? Understanding failure causes and modes for HDDs using technology of the current era and the near future will highlight the need for design alternatives and trade-offs that are critical to future storage systems. Software developers and RAID architects can not only better understand the effects of their decisions, but also know which HDD failures are outside their control and which they can manage, albeit with possible adverse performance or availability consequences. Based on technology and design, where must the developers and architects place the efforts for resiliency?
This article identifies significant HDD failure modes and mechanisms, their effects and causes, and relates them to system operation. Many failure mechanisms for new HDDs remain unchanged from the past, but the insidious undiscovered data corruptions (latent defects) that have plagued all HDD designs to one degree or another will continue to worsen in the near future as areal densities increase.
iLLustration by su Perbrothers
New drive technologies and increased capacities create new categories of failure modes that will influence system designs.
harD-DIsK DrIvEs (hDDs)
are like the bread in a peanut
butter and jelly sandwich—seemingly unexciting pieces of hardware necessary to hold the software. They are simply a means to an end. HDD reliability, however, has always been a significant weak link, perhaps the weak link, in data storage. In the late 1980s people recognized that HDD reliability was inadequate for large data storage systems so redundancy was added at the system level with some brilliant software algorithms, and RAID (redundant array of independent disks) became a reality. RAID moved the reliability requirements from the HDD itself to the system of data disks. Commercial implementations of RAID include n+ 1 configurations
References:
Archives