Doi: 10.1145/1839676.1839692
Article development led by
queue.acm.org
As storage systems grow larger and larger,
protecting their data for long-term storage
is becoming ever more challenging.
By DaViD s.h. RosenthaL
Keeping
Bits safe:
how hard
can it Be?
TheSe DayS, We are all data pack rats. Storage is
cheap, so if there is a chance the data could possibly
be useful, we keep it. We know that storage isn’t
completely reliable, so we keep backup copies as
well. But the more data we keep, and the longer we
keep it, the greater the chance that some of it will be
unrecoverable when we need it.
There is an obvious question we should
be asking: how many copies in storage
systems with what reliability do we
need to get a given probability that the
data will be recovered when we need
it? This may be an obvious question
to ask, but it is a surprisingly difficult
question to answer. Let’s look at the
reasons why.
To be specific, let’s suppose we need
to keep a petabyte for a century and
have a 50% chance that every bit will
survive undamaged. This may sound
like a lot of data and a long time, but
there are already data collections big-
ger than a petabyte that are important
to keep forever. The Internet Archive is
already multiple petabytes.