fice of Science and Technology Policy
released a Request for Information,
soliciting comments regarding public
access to digital data resulting from
federally funded research in November 2011.
Also, some of the research funded
under NSF’s Sustainable Digital Data
Preservation and Access Network
Partners (DataNet) program is beginning to bear significant fruit. Although
Spengler says the DataNet projects
were and are intended to be exemplars
and fairly restrictive prototypes due
to limited funding, her NSF colleague
Rob Pennington says DataNet award-ees are working with other researchers
eager to find ways to share data across
domains and disciplines.
One standout example of this
is iRODS (integrated Rule-Orient-ed Data System), developed by the
Data Intensive Cyber Environments
(DICE) research group at the University of North Carolina (UNC) and the
University of California, San Diego.
Institutions spanning disciplines
from climatology to social sciences
are adopting the innovative data grid
tool. The NSF awarded iRODS developers $8 million in September 2011
to build a policy-driven national data
management infrastructure, motivated by the discrete data management
requirements of the NSF’s Ocean Observatories Initiative, NSF’s Consortium of Universities for Advancement
of Hydrologic Science, engineering
projects in education, CAD/CAM/
CAE archives, the genomic databases
of the iPlant collaborative, the H.W.
Odum Institute for Research in Social
Science at UNC, and NSF’s Science of
Learning Centers.
iRODS has also been adopted by
scientific data centers worldwide,
including astronomical observatories in Canada and France, climate
centers in the U.S., and at the Sanger
Institute genomics databases in the
U.K. It is also in use in the U.S. National Archives and Records Administration’s Transcontinental Persistent
Archives Prototype.
iRODS is the successor to the pioneering Storage Resource Broker (SRB)
architecture. DICE Director Reagan
Moore says iRODS’s rule engine-based
architecture makes the distributed
management of a data grid much sim-
there is little to
no national-level
coordination of
data preservation
standards, even
in the u.K.’s
well-documented
research council
guidelines.
pler than the hard-coded SRB architecture, can serve as the sort of reporting
tool that demonstrates researchers are
meeting their mandated data management plan outlines—and is also the
sort of policy-based system that melds
the concepts of data management and
data preservation for whatever duration is required.
“We make the assertion that any
data-management application really
consists of the policies you’re apply-
ing in order to validate assertions
about what you’ve done,” Moore says.
“So if I build a preservation environ-
ment, my policies are related to au-
thenticity, integrity, chain of custody,
and original arrangement.”
Moore says the rule-based iRODS
architecture makes it possible for us-
ers to tailor which policies apply to a
given action without having to rewrite
any code, whereas server-side com-
mands in SRB were hard coded. These
rules are applied in a platform-agnos-
tic manner through any number of 254
microservices selected as pertinent by
any user community.
At least one project has already
successfully replicated a recognized
sample archive, the Harvard IQSS-developed Dataverse, using iRODS.
Researchers at UNC’s Odum Institute
performed a Dataverse-to-iRODS transfer using the Open Archives Initiative
Protocol for Metadata Harvesting and
the compatible Data Documentation
Initiative standard, plus XML.
“The result is an accurate copy of a
Dataverse archive inside iRODS,” ac-
cording to the UNC authors, “which
data grid administrators can preserve
over the long term by, for example,
replicating the information to many
geographically distributed storage re-
sources.”
Phil Butcher, head of information
technology at the Sanger Institute, says
the organization’s iRODS installation
has run smoothly. He believes fund-
ing agencies should make themselves
aware of the details of such ground-
breaking technologies, even if com-
prehensive national and international
management and preservation policies
are not possible.
Further Reading
Beagrie, N., Lavoie, B., and Woollard, M.
Keeping Research Data Safe 2, JISC,
Bristol, U.K., April 2010..
Chiang, G., Clapham, P., Qi, G.,
Sale, K., and Coates, G.
Implementing a genomic data management
system using iRODS in the Wellcome Trust
Sanger Institute, BMC Bioinformatics 12,
361, Sept. 9, 2011.
Neuroth, H., Strathmann, S., and Vlaeminck, S.
Digital preservation needs of scientific
communities: The example of Göttingen
University, Proceedings of the 12th European
Conference on Research and Advanced
Technology for Digital Libraries, Aarhus,
Denmark, Sept. 14–19, 2008.
Rothenberg, J. and Hoorens, S.
Enabling Long-term Access to Scientific,
Technical and Medical Data Collections,
RAnD Europe, Cambridge, U.K., 2010.
Ward, J.H., de Torcy, A., Chua, M.,
and Crabtree, J.
Extracting and ingesting DDI metadata
and digital objects from a data archive into
the iRODS extension of the nARA TPAP
using the OAI-PMh, 5th IEEE International
Conference on e-Science, Oxford, UK, Dec.
9–11, 2009.
Gregory Goth is an oakville, ct-based writer who
specializes in science and technology.
© 2012 acm 0001-0782/12/04 $10.00