Overview
Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
Karma originates in the Linked Environments for Atmospheric Discovery project and has since been generalized as a standalone tool. The tool is being applied to the Life Science Grid, an open source biochem discovery cyberinfrastructure promoted by Eli Lilly Corp., and in other settings. The current release is Karma 2.1.
Download
The current version of Karma (v2.1) supports provenance activities published from services, workflows and nested workflows. The provenance data is efficiently stored in a relational database, and supports the emerging Open Provenance Model (OPM) standard for interfacing with the tool. Karma supports data oriented and process oriented views of the provenance. Karma v2,1 supports either synchronous submission of provenance as activities using a web-services API or a scalable asynchronous mode using WS-Eventing notifications. Provenance clients can use the Notifier library to generate provenance activities. Synchronous recording is suggested for recording provenance in the scale of hundreds of workflows or if setup of a notification broker is to be avoided. The WS-Messenger notification broker is the suggested WS-Eventing implementation to use. Karma service requires the availability of a MySQL v5.0 or later with a database assigned to it (preferably named 'karma2').
Karma v2.1.0 The Karma distribution files required to run the provenance service and GUI [source] [binary]
Notifier v1.0 The Notifier (i.e., Provenance Tracking) library required to publish provenance activities [source] [binary]
WS-Messenger is a WS-Eventing based notification broker used to asynchronously publish provenance activites. More
Documentation
- Download offline Javadocs for Notifier v1.0
- XML Schema description of Provenance activities
- WSDL description & XML Schema Types of Karma web-service
Upcoming release: Karma V3.0 (Expected release: January 30, 2010)
Karma v3.0 will support instrumentation through Axis 2 handlers in addition to Java applications. It will also include asynchronous communication using WS-Messenger. Future releases will support additional pub/sub options, more instrumentation approaches, and query capability including the ability to query on them recursively over space (through different levels of the workflow) and time (forward and backwards in the dataflow).
| Future (late Spring): | Early Spring Karma v3.0: | Karma v2.1: |
| Karma server (query and publish web service) | Karma server (query and publish web service) | Karma server (query and publish web service) |
| V3.0 plus: Asynch communication with other pub/sub systems; additional instrumentation | V2.1 plus: Instrumentation using Axis-2 handler; WS-Messenger asynchronous calls | Java notifier library (synch calls to Karma service) |
| V3.0 plus: preservation client, visualization and richer access clients | V2.1 plus: simple access client | OPM RDF and XML results |
| Dependencies: V2.1 plus others as needed | Dependencies: v2.1 plus WS-Messenger (latter optional) | Dependencies: MySQL |
Publications
For a list of publications related to Karma please click here