Published 07/15/2003
The Global Grid Forum (GGF) recently held its eighth conference, GGF-8, in Seattle, Washington, with the theme, "Building Grids - Obstacles & Opportunities." Reagan Moore, co-director of SDSC's Data and Knowledge Systems (DAKS) program, gave three presentations at the forum.
For the GGF Data Transport Research Group, Moore gave a talk on "Remote Storage Repository Operations." In today's data-intensive computing, scientists are working with growing data sets that are often distributed across multiple sites, and improved methods of access and management are vital. Based on experiences with data grids, digital libraries, and persistent archives, Moore described data-handling systems that perform data manipulation operations directly at remote storage systems before data is moved. There are roughly four categories of operations: byte-level access as used in Unix file system operations; bulk operations to manage latency when manipulating a large number of remote files; object-oriented storage, which involves execution of application operations directly at the remote repository; and specialized operations that result when accessing heterogeneous systems. Several Global Grid Forum projects are defining remote operations, including the UK e-Science Database Access and Integration Services Working Group, the Grid File System BOF or birds of a feather group at GGF-8, and the UK e-Science Data Format Description Language research group.
In the GGF Persistent Archive Research Group, Moore and Andre Merzky of the Konrad-Zuse-Zentrum fur Informationstechnik in Berlin, Germany, presented the final draft of a document on "Persistent Archive Concepts." The document describes how preservation systems that manage technology evolution can be developed from data grids. A GGF working group is being proposed on this topic for the creation of a GGF Recommendation document that will identify significant implementations of the Persistent Archive technology, with successful operational experience that demonstrates its usefulness.
Moore also gave a presentation on "Consistency Constraints" to the GGF Grid Protocol Architecture Research Group, exploring concepts related to management of state information generated by grid services. A simple example is a replica service, which creates a copy of a file. The state information that describes the location of the copy must now be managed, so that the copy can be discovered and accessed. Consistency constraints are required to specify how the access controls on the copy will be defined relative to the access controls on the original, how the copies will be synchronized when the original file is changed, etc. Multiple types of consistency constraints were examined, ranging from temporal constraints on the update of the state information, to logical constraints on how the name spaces associated with the services are interpreted, to internal consistency parameters to define when copies are out of sync.
In these presentations, Moore based the discussions on implementations of SDSC data and knowledge management technologies for multiple agencies, from NASA and DOE projects to the NSF National Virtual Observatory, NVO, the NIH Biomedical Informatics Research Network or BIRN project, the Library of Congress, the NSF National Science Digital Library, and the National Archives and Records Administration (NARA) persistent archive project.
The Global Grid Forum is a community-initiated forum of researchers working on distributed computing, or "grid" technologies. GGF's primary objective is to support the development, deployment, and implementation of Grid technologies and applications through the creation and documentation of "best practices," including technical specifications, user experiences, and implementation guidelines. -Paul Tooby.