RCSB EXPERIENCE
"The RCSB proposal was evaluated using a standard merit review procedure that included a site visit and advisory panel," said Gerald Selzer, program director in the NSF Division of Biological Infrastructure. "Experts in x-ray crystallography, other areas of biology, computer science, and database technology participated in the evaluation of the proposal. Reviewers and agency staff alike were impressed with the plans for operating the database, the management scheme across the three sites, and the expertise that the RCSB brings to this important task."
The expertise of the RCSB members in structure data processing and analysis covers data validation, data modeling, database development, query languages and visualization tool development. The group has developed and currently maintains 11 publicly available structural biology databases.
Much of the framework for the RCSB's PDB is based on the Nucleic Acid Database (NDB), established at Rutgers University in 1990 and directed by Berman. The NDB provides tools and information for studying nucleic acids, including nucleic acids bound to proteins. The Rutgers team has developed automated tools for depositing and validating entries in the database as well as querying and generating reports.
The NDB tools have been used to launch related macromolecular structure databases at Rutgers. Proteins Plus mirrors the PDB and adds the query and report generation capabilities of the NDB. Databases have also been created for proteins that bind to DNA, for nucleic acid structures determined by NMR, and for macromolecule components and ligands to which they bind.
Over the past several years, Phil Bourne, SDSC senior staff scientist, has led a team of computational biologists at SDSC in developing database models for bioinformatics. The result of these efforts, the Property Object Model (POM), is a non-proprietary database model designed to store biological structure data and enable fast queries.
"Speed is important for searching for patterns of structural properties within fast-growing biological databases," Bourne said. "Being non-proprietary is important for making databases and query tools available to the community at large for both public and unpublished data."
SDSC has built and now maintains four POM-based resources. The Protein Kinase Resource is a repository of information about a diverse family of enzymes that play a major role in communication between cells. SDSC also maintains the PDB Obsolete Structures Database, to provide a chronology of versions for PDB structures, the WPDB software, for searching PDB data on PC platforms, and MOOSE, which queries the native and derived features of macromolecular structure from PDB data.
Through CARB, NIST maintains several bioinformatics databases, including the Biological Macromolecule Crystallization Database (BMCD). Led by Gary Gilliand, CARB's Biotechnology Division chief and an adjunct professor at the University of Maryland Biotechnology Institute, the BMCD was established to help crystallographers develop strategies to find suitable crystals for determining 3-D structure. The BMCD contains crystal data and the crystallization conditions, compiled from the literature, from which researchers have obtained diffraction-quality crystals.
|