guage, allowing them to obtain compact process documentation. By adopting an open data model for process documentation, like the one we’ve advocated here, such systems could be integrated into heterogeneous applications that seamlessly execute provenance queries.
The database community has also investigated provenance [ 2, 5] but adopted different assumptions; for instance, it assumes the existence of a query language for which queries may be reversed to identify the origin of results. As in our approach, different kinds of provenance (such as why and where [ 2]) are viewed as being of value as specific instances of provenance queries.
The Provenance Aware Storage System developed at Harvard University [ 9] is designed to automatically produce documentation of execution by capturing file system events in an operating system. Like all other approaches, capturing small-grain documentation involves scalability and performance challenges, so deriving information at a suitable level of abstraction for the user is often difficult.
CONCLUSION
The IT landscape, which once exclusively involved closed monolithic applications, today involves applications that are open and composed dynamically while being able to discover results and services on the fly. Users must know whether they have confidence in their applications’ electronic data; it must therefore be accompanied by its provenance that describes the process that led to its production.
To achieve this vision, we’ve proposed an open approach through which applications, irrespective of technology, document their execution in an open data model that can then be used to run provenance queries tailored to user needs. In the same way scholars can appreciate works of art by studying their documented history, users would be able to gain confidence in electronic data thanks to provenance queries. c
REFERENCES
1. Alvarez, S., Vazquez-Salceda, J., Kifor, J., Varga, L., and Willmott, S. Applying provenance in distributed organ transplant management. In Proceedings of the International Provenance and Annotation Workshop Vol. 4145 of Lecture Notes in Computer Science (Chicago, May 3– 5). Springer, Heidelberg, 2006, 28– 36.
2. Buneman, P., Khanna, S., and Tan, W.-C. Why and where: A characterization of data provenance. In Proceedings of Eighth International Conference on Database Theory Vol. 1973 of Lecture Notes in Computer Science (London, Jan. 4– 6). Springer, Heidelberg, 2001, 316–330.
3. Burbeck, S. The Tao of E-business Services. Technical Report. IBM Software Group, Oct. 2000; www.ibm.com/developerworks/webservices/ library/ws-tao/.
4. Clifford, B., Foster, I., Voeckler, J.-S., Wilde, M., and Zhao, Y. Tracking provenance in a virtual data grid. Concurrency and Computation: Practice and Experience (2007); dx.doi.org/10.1002/cpe.1256.
5. Cui, Y., Widom, H., and Wiener, J. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems 25, 2 (June 2000), 179–227.
6. Foster, I., Kesselman, C., Nick, J., and Tuecke, S. Grid computing: Making the global Infrastructure a reality. In The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Wiley Series in Communications Networking and Distributed Systems. John Wiley & Sons, Chichester, England, 2003, 217–249.
7. Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., and Moreau, L. D3.1.1: An Architecture for Provenance Systems. Technical Report. University of Southampton, Southampton, U.K., Feb. 2006; eprints.ecs.soton.ac.uk/12023/.
8. Miles, S., Groth, P., Branco, M., and Moreau, L. The requirements of recording and using provenance in e-science experiments. Journal of Grid Computing 5, 1 (Mar. 2007), 1– 25.
9. Seltzer, M., Holland, D., Braun, U., and Muniswamy-Reddy, K.-K. Passing the provenance challenge. Concurrency and Computation: Practice and Experience (2007); dx.doi.org/10.1002/cpe.1233.
10. Zhao, J., Goble, C., Stevens, R., and Turi, D. Mining Taverna’s semantic web of provenance. Concurrency and Computation: Practice and Experience (2007); dx.doi.org/10.1002/cpe.1231.
LUC MOREAU ( L.Moreau@ecs.soton.ac.uk) is a professor of computer science in the School of Electronics and Computer Science at the University of Southampton, Southampton, U.K.. PAUL GROTH ( pgroth@isi.edu) is a post-doctoral researcher in the Information Science Institute at the University of Southern California, Marina del Rey, CA. SIMON MILES ( simon.miles@kcl.ac.uk) is a lecturer in the Department of Computer Science at King’s College London, London, U.K. JAVIER VAZQUEZ-SALCEDA ( jvazquez@lsi.upc.edu) is a post-doctoral researcher in the Computer Science Department at the Uni-versitat Politecnica de Catalunya, Barcelona, Spain. JOHN IBBOTSON ( john_ibbotson@uk.ibm.com) is a senior software engineer at IBM U.K.’s Hursley Development Laboratory, Winchester, U.K. SHENG JIANG ( sj@ecs.soton.ac.uk) is a post-doctoral researcher in the School of Electronics and Computer Science at the University of Southampton, Southampton, U.K.. STEVE MUNROE ( sj.munroe@uk.ibm.com) is an IT consultant/ technical team lead at IBM United Kingdom, Ltd., Global Business Services, Winchester, U.K. OMER RANA (o. f.rana@cs.cardiff.ac.uk) is a reader in computer science at Cardiff University and the Deputy Director of the Welsh eScience Center, Cardiff, Wales, U.K. ANDREAS SCHREIBER ( Andreas.Schreiber@dlr.de) is a research scientist and head of the Distributed Systems and Component Software Department at the German Aerospace Center, Cologne, Germany. VICTOR TAN ( vhkt@ecs.soton.ac.uk) is a post-doctoral researcher in the School of Electronics and Computer Science at the University of Southampton, Southampton, U.K. LASZLO ZSOLT VARGA ( laszlo.varga@sztaki.hu) is a senior scientific associate and head of the System Development Department at the Hungarian Academy of Sciences, Budapest, Hungary.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
© 2008 ACM 0001-0782/08/0400 $5.00
References:
http://dx.doi.org/10.1002/cpe.1233
http://dx.doi.org/10.1002/cpe.1231
mailto:L.Moreau@ecs.soton.ac.uk
mailto:john_ibbotson@uk.ibm.com
mailto:Andreas.Schreiber@dlr.de
http://www.ibm.com/developerworks/webservices/library/ws-tao/
http://www.ibm.com/developerworks/webservices/library/ws-tao/
http://eprints.ecs.soton.ac.uk/12023/
Archives