was obtained by applying function f1 to input M1.)
With the two kinds of p-assertions—interaction and relationship—process documentation as a whole is greater than the sum of its individual parts. Indeed, while p-assertions are simple pieces of documentation produced by services autonomously, interaction and relationship p-assertions together capture an explicit description of the flow of data in a process. Interaction p-assertions denote data flows between services, whereas relationship p-assertions denote data flows within services. These flows capture the causal and functional data dependencies in execution and, in the most general case, constitute a directed acyclic graph (DAG) (see Figure 3). For a specific data item, the data-flow DAG indicates how it is produced and used and is thus a core element of provenance representation, though not the only one.
Beyond the flow of data in a process, internal service states may be needed to understand nonfunctional characteristics of execution (such as the performance or accuracy of services) and therefore the nature of the results they compute. Hence, a service-state p-assertion is documentation provided by a service about its internal state in the context of a specific interaction. Service-state p-assertions are varied; they may include the amount of disk and CPU time used by a service in a computation, the local time when an action occurred, the floating-point precision of the results it produced, or applica-tion-specific state descriptions.
In order for provenance-aware applications to be interoperable, it is critical that the process documen-
tation they respectively produce be structured according to a shared data model. Therefore, the novelty of our approach is the openness of the proposed model of documentation [ 7] conceived as independent of application technologies [ 8]. These characteristics together allow process documentation to be produced autonomously by application services and expressed in an open format over which provenance queries may be expressed.
QUERYING THE PROVENANCE OF ELECTRONIC DATA
Provenance queries are user-tailored queries over process documentation aimed at obtaining the provenance of electronic data. In this context, the data item of interest to the user must first be characterized. Indeed, since data is indeed mutable, its provenance, or history, can vary according to the point in execution from which a user wishes to find it. A provenance query must be able to identify a data item with respect to a given documented event (such as sending or receiving a message).
The full detail of everything that ultimately caused a data item to be what it is could be quite large; for example, the full provenance of an experiment’s results almost always includes a description of the process that produced the materials in the experiment, along with the provenance of any materials used in producing these materials and the devices and software (and their settings) used in the experiment. Should documentation be available, the full provenance would ultimately include details of processes leading back to the beginning of time or at least to the epoch of provenance awareness.
Users must be able to express the scope of their
References:
Archives