abytes of data and execution of complex tasks. For software designers, it
may sometimes be beneficial to model
the simulations and data processing
as event generators, using streaming
and complex event-processing techniques to summarize data operations
with little overhead or controllable accuracy guarantees.
Data and Process integration
Large-scale experiments organized
by scientists collect and process huge
amounts of raw data. Even if the original data is reorganized and filtered in
a way that keeps only the interesting
parts for processing, these interesting parts are still big. The reorganized
data is augmented with large volumes
of metadata, and the augmented reorganized data must be stored and analyzed.
Scientists must collaborate with
computer engineers to develop custom solutions supporting data storage and analysis for each experiment.
In spite of the effort involved in such
collaborations, the experience and
knowledge gained this way is not
generally disseminated to the wider
scientific community or benefit next-generation experimental setups.
Computer engineers must therefore
develop generic solutions for storage
and analysis of scientific data that can
be extended and customized to reduce
the computing overhead of time-consuming collaborations. Developing
generic solutions is feasible, since
many low-level commonalities are
available for representing and analyzing experimental data.
Management of generic physical
models. Experimental data tends to
have common low-level features not
only across experiments of the same
science, but across all sciences. For
example, reorganized raw data enhanced with metadata usually involves
complex structures that fit the object-oriented model. Scientific data representation benefits from inheritance
and encapsulation, two fundamental
innovations of the object-oriented
data model.
Beyond its complexity in terms of
representation, scientific data is char-
acterized by complex interdependen-
cies, leading to complex queries dur-
ing data processing and analysis. Even
though the object-oriented model is
suitable for the representation of sci-
entific data, it cannot efficiently opti-
mize and support complex queries.