be transformed, making them provenance-aware, so the data’s provenance may be retrieved, analyzed, and reasoned over.
The Oxford English
Dictionary defines prove-
nance as: “(i) the fact of
coming from some par- Administer
store and its
ticular source or quarter; contents
origin, derivation; (ii) the
history or pedigree of a
work of art, manuscript, rare book, etc.; concretely, a
record of the ultimate derivation and passage of an
item through its various owners.” Hence, we can
regard provenance as the derivation from a particular
source to a specific state of an item. The description
of such a derivation may take different forms or
emphasize different properties according to a user’s
personal interest. For instance, for a work of art,
provenance usually identifies its chain of ownership;
alternatively, the actual state of a painting may be
understood better by studying the various restorations
it has endured.
The dictionary definition also identifies two distinct ways to view provenance: the source (or derivation) of an object and the record of the derivation. A computer-based representation of provenance is crucial for users who want to analyze, reason, and decide whether or not they trust electronic data.
Here, we introduce the provenance life cycle, summarizing key principles underpinning existing provenance systems. We then examine an open data model for describing how applications are executed; in this context, provenance is seen as a user query over such descriptions.
We illustrate the vision of provenance-aware applications through a concrete example in health-care
Electronic data
management, contrasting
Record documentation of execution it with existing systems.
The scientific and business communities [ 6]
Query and both embrace a service-ori- reason over provenance of data ented architecture (SOA)
that allows the dynamic discovery and composition of services. SOA-based applications are increasingly dynamic and open but must satisfy new requirements in both e-science and business. In an ideal world, e-science end users would be able to reproduce their results by replaying previous computations, understand why two seemingly identical runs with the same inputs produce different results, and determine which data sets, algorithms, or services were involved in their derivation.
In e-science and business, some users, reviewers, auditors, and even regulators must verify that the process that led to some result complies with specific regulations or methodologies; further, they must prove the results were derived independently from services or databases with given license restrictions; and they must also establish that the data was captured at the source by instruments with some precise technical characteristics.
While some users must perform such tasks today, they are unable to do so or do it only imperfectly, because the underpinning principles have not been investigated, and systems have not been designed to support such requirements. A key observation is that electronic data does not typically contain the historical information that would help end users, reviewers, or regulators make the necessary verifications. Hence,
Provenance-Aware Application
Provenance Store
References:
Archives