Published August 4, 2015
The National Science Foundation (NSF) is funding a $5 million, five-year award for UC San Diego and UC Berkeley to establish the Pacific Research Platform (PRP), a high-capacity, data-centric “freeway system” that will eventually give participating universities and other research institutions the ability to move data about 1,000 times faster than speeds on today’s inter-campus shared Internet.
“To accelerate the rate of scientific discovery, researchers must get the data they need, where they need it, and when they need it,” said Larry Smarr, a UC San Diego Computer Science and Engineering Professor, director of the California Institute for Telecommunications and Information Technology (Calit2) at the university, and principal investigator of the PRP initiative. “This requires a high-performance data freeway system in which we use optical light paths to connect data generators and users of that data.”
“PRP will enable researchers to use standard tools to move data to and from their labs and their collaborators’ sites, supercomputer centers and data repositories distant from their campus IT infrastructure, at speeds comparable to accessing local disks,” said co-PI Thomas A. DeFanti, a research scientist in Calit2’s Qualcomm Institute at UC San Diego.
DeFanti and co-PI Phil Papadopoulos, Division Director of Cloud and Cluster Software Development at the San Diego Supercomputer Center (SDSC), an Organized Research Unit of UC San Diego, will coordinate the efforts of the large group of network engineers, network providers, and measurement programmers at the PRP institutions.
PRP links most of the research universities on the West Coast (the 10 University of California campuses, San Diego State University, Caltech, USC, Stanford, University of Washington) via the Corporation for Education Network Initiatives in California (CENIC)/Pacific Wave’s 100G infrastructure.
The PRP’s data sharing architecture, with end-to-end 10-100 gigabits per second (Gb/s) connections, will enable region-wide virtual co-location of data with computing resources and enhanced security options. To demonstrate extensibility, PRP also connects the University of Hawaii System, Montana State University, the University of Illinois at Chicago, Northwestern, and the University of Amsterdam.
Other research institutions in the PRP include Lawrence Berkeley National Laboratory (LBNL) and four national supercomputer centers: SDSC, NERSC-LBNL, NAS-NASA Ames, and NCAR. In addition, the PRP will interconnect with the NSF-funded Chameleon NSFCloud research testbed and the Chicago StarLight/MREN community.
Fifteen existing multi-campus data-intensive application teams act as drivers of the PRP, providing feedback over the five years to the technical design staff. These application areas include accelerator particle physics, astronomical telescope survey data, gravitational wave detector data analysis, galaxy formation and evolution, cancer genomics, human and microbiome ‘omics integration, biomolecular structure modeling, natural disaster, climate, CO2 sequestration simulations, as well as scalable visualization, virtual reality, and ultra-resolution video.
The PRP will be extensible both across other data-rich research domains as well as to other national and international networks, potentially leading to a national and eventually global data-intensive research cyberinfrastructure.
The leadership team includes faculty from two of the multi-campus Gray Davis Institutes of Science and Innovation created by the State of California in the year 2000: Calit2, and the Center for Information Technology Research in the Interest of Society (CITRIS), led by UC Berkeley.
“The Pacific Research Platform is an ideal vehicle for collaboration between CITRIS and Calit2 given the growing importance of universities working together for the benefit of society,” said CITRIS Deputy Director Camille Crittenden, co-PI on the PRP award. Crittenden will manage the science engagement team and the enabling relationships with CIOs on participating campuses and labs.
In addition to DeFanti, Papadopoulos, and Crittenden, Frank Würthwein, Distributed High-Throughput Computing Lead at SDSC and a UC San Diego physicist, is also a PRP co-PI. Würthwein will lead technical development of the application groups and monitor progress from the scientists’ perspective.
“The PRP is not a build-it-and-they-will-come exercise,” said Würthwein, who is also executive director of the Open Science Grid (OSG), which facilitates access to distributed high-throughput computing for research in the U.S. “The cyberinfrastructure is responsive to the existing and expected needs of data-intensive applications, so we are building a very science-focused platform that will put these universities above and beyond what other regions already have.”
Würthwein is closely involved in the global Large Hadron Collider (LHC) community, which accounted for roughly two-thirds of the OSG’s 800 million computational hours in 2014. Other disciplines consuming OSG resources include social sciences (notably economics), engineering and medicine. The PRP-wide LHC cyberinfrastructure is a direct outgrowth of the SDSC/LHC UC-wide initiative started in October 2014.
From Hours to Minutes
The PRP project emerged from earlier NSF grants awarded to the PRP investigators (OptIPuter, GreenLight, StarLight, Quartzite, and Prism), which led to brainstorming at the CENIC 2014 annual retreat. A subsequent one-day workshop in December 2014, hosted at Stanford, led to a decision to publicly demonstrate the feasibility of the PRP. To do so, the partners engaged network engineers from a number of PRP member campuses to work intensively for the first 10 weeks of 2015 on a proof-of-principle demonstration of high-performance data transfers between Science DMZs over existing elements of the proposed infrastructure. This required extensive collaboration among the PRP partner campuses, led by CENIC’s John Hess. The result involved disk-to-disk data transfers from within one campus Science DMZ to another. These transfers, using 100GbE infrastructure, are made possible because of the NSF’s CHERuB award that upgraded UC San Diego to the CENIC bandwidth.
After iteration and tuning, tests demonstrated data transfer speeds of 9.6Gb/s out of 10 from UC Berkeley, UC Irvine, UC Davis, and UC Santa Cruz to UC San Diego, with two transfers at 36Gb/s out of 40 from UCLA & Caltech to UC San Diego. During the demonstration at CENIC 2015, one PRP-optimized test moved 1.6 Terabytes in four minutes; by contrast, using the default campus Internet, it took three hours to transfer 0.1 Terabytes, demonstrating a 720x improvement.
Separately, NSF has awarded funds to hold a PRP design workshop at UC San Diego, now scheduled for October, 2015, entitled: ‘Building an Interoperable Regional Science DMZ.” This workshop will bring together the PRP application driver researchers with the distributed computer architects, the network engineers, and the multi-institutional IT/Telecom administrators to further refine the PRP implementation.
About SDSC
As an Organized Research Unit of UC San Diego, SDSC is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services, and expertise to the national research community, including industry and academia. Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences and biology to astrophysics, bioinformatics, and health IT. SDSC’s Comet joins the Center’s data-intensive Gordon cluster, and are both part of the National Science Foundation’s XSEDE (eXtreme Science and Engineering Discovery Environment) program, the most advanced collection of integrated digital resources and services in the world.
Share