Karma

Karma Tool for Provenance Collection and Storage

Overview

Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.

Karma originates in the Linked Environments for Atmospheric Discovery project and has since been generalized as a standalone tool. The tool is being applied to the Life Science Grid, an open source biochem discovery cyberinfrastructure promoted by Eli Lilly Corp., and in other settings. The current release is Karma 2.1.

Download

The current version of Karma (v2.1) supports provenance activities published from services, workflows and nested workflows. The provenance data is efficiently stored in a relational database, and supports the emerging Open Provenance Model (OPM) standard for interfacing with the tool. Karma supports data oriented and process oriented views of the provenance. Karma v2,1 supports either synchronous submission of provenance as activities using a web-services API or a scalable asynchronous mode using WS-Eventing notifications. Provenance clients can use the Notifier library to generate provenance activities. Synchronous recording is suggested for recording provenance in the scale of hundreds of workflows or if setup of a notification broker is to be avoided. The WS-Messenger notification broker is the suggested WS-Eventing implementation to use. Karma service requires the availability of a MySQL v5.0 or later with a database assigned to it (preferably named 'karma2').

Karma v2.1.0 The Karma distribution files required to run the provenance service and GUI [source] [binary]

Notifier v1.0 The Notifier (i.e., Provenance Tracking) library required to publish provenance activities [source] [binary]

WS-Messenger is a WS-Eventing based notification broker used to asynchronously publish provenance activites. More

Documentation

Upcoming release: Karma V3.0 (Expected release: January 30, 2010)

Karma v3.0 will support instrumentation through Axis 2 handlers in addition to Java applications. It will also include asynchronous communication using WS-Messenger. Future releases will support additional pub/sub options, more instrumentation approaches, and query capability including the ability to query on them recursively over space (through different levels of the workflow) and time (forward and backwards in the dataflow).

Future (late Spring): Early Spring Karma v3.0: Karma v2.1:
Karma server (query and publish web service) Karma server (query and publish web service) Karma server (query and publish web service)
V3.0 plus: Asynch communication with other pub/sub systems; additional instrumentation V2.1 plus: Instrumentation using Axis-2 handler; WS-Messenger asynchronous calls Java notifier library (synch calls to Karma service)
V3.0 plus: preservation client, visualization and richer access clients V2.1 plus: simple access client OPM RDF and XML results
Dependencies: V2.1 plus others as needed Dependencies: v2.1 plus WS-Messenger (latter optional) Dependencies: MySQL

Publications

For a list of publications related to Karma please click here

Contact