interest in a process through a provenance query, essentially performing a reverse graph traversal over the data flow DAG and terminating according to the query-specified scope; the query output is a DAG subset. Scoping can be based on types of relationships, intermediary results, services, or subprocesses [ 7].
IN HEALTH CARE MANAGEMENT
To illustrate our approach, we explore a health care management application. The Organ Transplant Management (OTM) system under development by the Catalan Transplant Organization, Catalonia, Spain, manages all the activities pertaining to organ transplants across multiple Catalan hospitals and their regulatory authority, the government of Catalonia, Spain [ 1]. OTM consists of a complex process involving the surgery itself, along with such activities as data collection and patient organ analysis that must comply with a set of regulatory rules. OTM is supported by an IT infrastructure that maintains records that allow medical personnel to view (and edit) a given patient’s local file within a given institution or laboratory. However, the system does not yet connect records or capture the dependencies among them or allow external auditors or patients’ families to analyze or understand how decisions are made.
By making OTM provenance-aware, powerful queries impossible withoutprove-nance-awareness functionality can now be supported (such as find all doctors involved in a decision, find all blood-test results involved in a donation decision, and find all data that led to a decision). Such functionality can be made available not only to the medical profession but also to regulators and families.
Here, we limit ourselves to a simplified subset of the OTM workflow—the process leading to the decision of whether or not to donate an organ. As a hospitalized patient’s health declines and in anticipation of a potential organ donation, an attending doctor requests the full health record for the patient and sends a blood sample for analysis. Through a context-sensitive menu-driven user interface (UI), the attending doctor submits the requests that are then passed to a software component (the donor data collector) responsible for collecting all expected results. If brain death is observed and logged into the system and if all requested data and analysis results are obtained, the system asks the doctor to decide about the donation of an organ. The decision, or the outcome of the doctor’s medical judgment based on the collected data, is explained in a report submitted by the doctor as the
decision’s justification.
Figure 3 (top) outlines the components involved in this scenario and their interactions. The UI sends requests (I1, I2, I3) to the donor data collector service, which gets data from the patient records database (I4, I5), along with analysis results from the laboratory (I6, I7), and finally requests a decision (I8, I9).
To make OTM provenance-aware, designers are augmenting OTM with the ability to produce an explicit representation of the process taking place, including p-assertions for all interactions (I1–I9), relationship p-assertions capturing dependencies between data items, and state p-assertions. Figure 3 (bottom) outlines the DAG representing a donation decision’s provenance, which consists of relationship p-assertions produced by provenance-aware OTM. DAG nodes denote data items, whereas DAG edges (in blue) represent relationships (such as data dependencies, like “is based on” and “is justified by,” and causal relationships, like “in response to” and “is caused by”). Each data item is annotated by the interaction in which it occurs. Further, the UI asserts a service-state p-assertion for each of its interactions about the users logged into the system.
Authorized users can then issue provenance queries that navigate the provenance graph, pruning it according to the querier’s needs; for example, from the graph, we can derive that users X and Y are both causing a donation decision to be reached. Figure 3 includes only a limited number of components, but in real-life examples involving vast amounts of documentation, users—doctors, patients, or regulatory authorities—benefit from a powerful and accurate provenance-query facility.
EXISTING SYSTEMS
The approach we’ve explored here is derived from an extensive requirement analysis [ 8] that resulted in a complete architectural specification [ 7] used as the basis for writing an open specification of data models and interfaces. The open approach allows the documentation of complex distributed applications, possibly involving multiple technologies (such as Web services, command-line executables, and monolithic executables). It also allows the expression of complex provenance queries to identify data and scoping processes independent of the technologies being used.
The Virtual Data System [ 4] and myGrid [ 10] are execution environments for scientific workflows that provide support for provenance. They focus on producing documentation from a workflow enactor’s viewpoint using data models compatible with p-assertions. They assume their respective workflow lan-
References:
Archives