Result of seven-trillion-electronvolt collisions (march 30, 2010) in the atLas particle detector on the Large hadron collider at ceRn, hunting
for dark matter, new forces, new dimensions, the higgs boson, and ultimately a grand theory to explain all physical phenomena.
Persistent common requirements
for scientific data management include:
˲ ˲ Automation of data and metadata
processing;
˲ ˲ Parallel data processing;
˲ ˲ Online processing;
˲ ˲ Integration of multifarious data
and metadata; and
IMAgE: CERn-EX-1003060 02 © CERn
˲ ˲Efficient manipulation of data/
metadata residing in files.
Lack of complete solutions us-
ing commercial DBMSs has led sci-
entists in all fields to develop or
adopt application-specific solutions,
though some have been added on top
of commercial DBMSs; for example,
the Sloan Digital Sky Survey (SDSS-
1 and SDSS- 2; http://www.sdss.org/)
uses SQL Server as its backend. More-
over, the resulting software is typi-
cally tightly bound to the application
and difficult to adapt to changes in
the scientific landscape. Szalay and
Blakeley9 wrote, “Scientists and scien-
tific institutions need a template and
best practices that lead to balanced
hardware architectures and corre-
sponding software to deal with these
volumes of data.”
Despite the challenges, the data-
management research community
continues to envision a general-pur-
pose scientific data-management
system adapting current innova-
tions: parallelism in data querying,
sophisticated tools for data definition
and analysis (such as clustering and
SDSS- 1), optimization of data orga-
nization, data caching, and replica-
tion techniques. Promising results
involve automated data organization,
provenance, annotation, online pro-
cessing of streaming data, embedded
complex data types, support for de-
clarative data, process definition, and
incorporation of files into DBMSs.