have been leaders in data curation,
access, and preservation issues. Digi-talPreservationEurope (
www.digital-preservationeurope.eu) in the E.U., the
National Library of Australia ( www.nla.
gov.au/policy/digpres.html), Koninkli-jke Biliotheek (the National Library of
The Netherlands, www.kb.nl/index-en.
html), and others around the world are
contributing to an increasing body of
knowledge and infrastructure to support data preservation and access for
efforts enabled by technology.
The digital data generated by research, industry, and governments
over the next decade will be subject
to increased regulation and evolving
community formats, standards, and
policies. This means the CI developed
to host and preserve it will need to
incorporate mechanisms to enforce
community policies and procedures
like auditing, authentication, monitoring, and association of affiliated
metadata. (As an unconventional example, think tagging of Facebook and
Flickr photos.). Emerging data CI and
management environments and systems, including IRODS ( www.irods.
org/), LOCKSS ( www.lockss.org/lockss/
Home), the Fedora Commons (www.
fedora-commons.org/), and D-Space
( www.dspace.org/), are beginning to
develop and incorporate mechanisms
that implement relevant policies and
procedures. Over the next decade, the
ability to automatically address the
requirements of policy and regulation
will be needed to ensure that our data
figure 2: the Data Pyramid.
Digital Data
collections
reference,
nationally
and internationally
important,
irreplaceable data
collections
Key research and community data collections
Personal data collections
Increasing
constituency
Increasing
value
Increasing
trust
CI empowers rather than limits us.
Trend 3. Storage costs for digital data
are decreasing (but that’s not the whole
story). One of the most important
trends affecting digital data is the decrease in price over time for a terabyte
(1012B) of data storage. According to
IDC11, a terabyte of “enterprise” storage was priced at roughly $440,000 in
1997. A decade later, the price for a
terabyte of enterprise storage averaged
around $5,400. In 2008, terabyte drives
cost approximately $200 (OEM cost).
In addition, holographic memory and
other new technologies promise even
better performance per price unit.
With storage so affordable, one
would expect the “data bill” of institutions and enterprises to be equivalently affordable. However, as storage
costs decrease, critical components of
the data bill (such as power, curation/
annotation, and professional expertise) are not decreasing. Today’s companies and institutions are investing
in enterprise data centers in locations
selected to minimize the power bill.
Google, Microsoft, and other technology companies spend billions on such
data centers—the heart of their businesses—and the cost savings rendered
through strategic placement can be
immense. Storage costs may be going
down, but the number of data centers
and the cost of powering them are taking a bigger and bigger bite out of current and projected data budgets. (See
Moore et al.
8 for a 2007 assessment of
the San Diego Supercomputer Cen-
societal
value
community
value
individual
value
ter’s total cost of providing storage infrastructure.)
In addition, most data centers employ a knowledgeable, professional
work force to ensure appropriate curation and annotation of digital data for
the smooth running of the data center
infrastructure and to plan ahead for future institutional and enterprise data
needs. A capable data work force is important for all sectors and will likely increase as a percentage of the overall IT
work force, along with the increasing
need for a well-managed, sustainable
digital data CI.
Finally, data centers must also factor in the cost of compliance with current and future regulations (possibly
requiring additional physical and/or
cyberinfrastructure for power backup
and monitoring) and the need for replication of valuable data sets. (Data with
long-term or substantive value is commonly stored with at least three copies, some hosted off-site.) We should
expect the overall costs of data centers
to continue to be substantial for the
foreseeable future.
Trend 4. Increasing commercialization of digital data storage and services.
The 2006 introduction of Amazon Simple Storage Solutions ( www.amazon.
com/gp/ browse.html?node=16427261)
was a high-profile example of the trend
toward commercialization of data storage and data services. Today, there is
considerable activity in the private sector around data storage and services for
the consumer; for example, we share
decreasing
risk of loss or
damage
Increasing
responsibility
Increasing
stability
Increasing
infrastructure
Repositories/
facilities
national- and
international-scale
repositories,
libraries,
archives
“regional”-scale libraries
and targeted data
archives
and centers