and store digital photos through Flickr,
employ Apple’s Time Capsule for regular personal computer backup, and use
LexisNexis for online legal services.
The commercialization of data
storage and services contributes an
important component of the data
CI environment needed to harness
the potential of our information-rich
world. However, private-sector storage and services are not the solution to
all digital data needs. For some digital
data considered to be “in the public
interest” (such as census data, official
records, critical scientific data collections, and a variety of irreplaceable
data), a greater level of trust, monitoring, replication, and accountability is
required to minimize the likelihood
of loss or damage and ensure the data
will be there for a very long time. For
such community data sets, stewardship by a trusted entity (such as libraries, archives, museums, universities,
and institutional repositories), whose
mission is the public good rather than
profit, is generally required.
There is no one-size-fits-all solution
for data stewardship and preservation.
The “free rider” solution of “Let someone else do it”—whether that someone
else is the government, a library, a museum, an archive, Google, Microsoft,
the data creator, or the data user—is
unrealistic and pushes responsibility to a single company, institution,
or sector when what is needed are
cross-sector economic partnerships.
Sustainable economic models for
digital data in the public interest are
the focus of an international Blue Ribbon Task Force for Sustainable Digital Preservation and Access ( brtf.sdsc.
edu), whose goal is to examine digital
preservation as an economic activity
and explore cost frameworks for various institutional scenarios. The Task
Force’s final report, due at the end of
2009, will focus on economic models,
components, and actionable recommendations for sustainable digital
preservation and access, though it is
already clear is that a diverse set of economic approaches are necessary.
In aggregate, these four trends
point to the need to take a comprehensive and coordinated approach to
data CI and treat the problem of sustainability holistically, creating strategies that make sense from a technical,
We do not produce
storage capacity
at the same rate
we produce digital
information. even
if we wanted to,
we cannot keep all
of our digital data.
policy, regulatory, economic, security,
and community perspective.
value and sustainability
In developing effective models for data
CI, perhaps the greatest challenge is
economic sustainability. A key question is: Who is responsible for supporting the preservation of valued digital
data? Critical to answering is the recognition that “value” means different
things to different people. There is
general agreement that official digital
government records (such as presidential email and videos of congressional
hearings in the U.S.) are preservation-worthy and of great political and historical value to society, but the video
of your niece’s voice recital is likely to
be of value to a much smaller family
group (unless, of course, your niece is,
say, Tina Turner).
Sustainability solutions for digital
data are inextricably related to who
values it and who is willing to support
its preservation. Governments worldwide are willing to support the preservation of digital content of national
value, a substantial undertaking that
involves hosting multiple copies of the
same data, migration of the data from
one generation of storage media to the
next to ensure it lives in perpetuity, and
protection of its integrity and authenticity. Your niece’s voice recital may
live on the hard drives of one or more
family members, but there is rarely an
explicit plan for how such a treasured
family artifact will be preserved for the
next decade and beyond.
How might we distinguish among
all the data use, stewardship, and
preservation scenarios to create and
identify the data CI solutions needed
to support them? One way is to borrow
from the world of computation and
adapt the Branscomb Pyramid model
to today’s data-use and data-steward-ship scenarios. In 1993, the NSF asked
Lewis Branscomb to lead a distinguished committee to consider the future of high-performance computing
in the U.S. The final report included a
useful framework, now known as the
Branscomb Pyramid, where the base of
the Pyramid associated the least-pow-erful computational platforms with
users needing computation for “
everyday” applications, the middle associated more powerful computational plat-