News Archive

Threading the Needle for Data-Driven Science and Data Access

San Diego Supercomputer Center part of $5.6M effort to diversify data access for science and society

Published October 5, 2021

The figure indicates sites involved in the proposed NSDF pilot testbed including the five main development sites and three Minority Serving Institutions (MSIs) with the computing environments at each campus, the Texas Advanced Computing Center (TACC), and the Massachusetts Green High Performance Computing Center (MGHPCC); data sources include Cornell High Energy Synchrotron Source (CHESS), IceCube facility, and the XENON dark matter experiment. The sites are connected by a high-speed network backbone provided by Internet2 and interoperate with OSG StashCaches and other resources.  Image courtesy of Valerio Pascucci, University of Utah.

The San Diego Supercomputer Center (SDSC) at UC San Diego is helping to democratize access to data. Its collaboration is part of a national team of researchers that has been awarded $5.6M from the National Science Foundation (NSF) to build the critical technology needed to sew-up the national computational infrastructure and to draw talent from a diversity of Americans to the data-driven sciences. 

Through a pilot project called the National Science Data Fabric (NSDF), led by the University of Utah’s Valerio Pascucci, the team will deploy the first infrastructure capable of bridging the gap between massive scientific data sources, the Internet2 network connectivity and an extensive range of high-performance computing facilities and commercial cloud resources around the nation.

Assisting Pascucci with this mission are several co-principal investigators (PIs): SDSC Interim Director Frank Würthwein, Michela Taufer at the University of Tennessee Knoxville, Alex Szalay at the Johns Hopkins University and John Allison at University of Michigan at Ann Arbor. The team will partner with NSF-funded efforts such as Fabric and Open Science Grid (OSG), and industry partners such as IBM Cloud and Google Cloud, on the project.

“The National Science Data Fabric is an effort that aims to transform the end-to-end data management lifecycle with advances in storage and network connections; deployment of scalable tools for data processing, analytics and visualization; and support for security and trustworthiness,” said Würthwein, who also serves as executive director of the OSG, the premiere national cyberinfrastructure for distributed high-throughput computing.

According to Pascucci, the science and technology sector is on the cusp of tremendous discoveries and technological innovations that benefit society, and fast-paced progress requires a cyberinfrastructure enabling high-speed access to the data generated by large experimental facilities worldwide, which he notes is a key unmet challenge.

“The massive IceCube neutrino detector had to be built at the south pole, but its data is best processed at large US computing facilities. The XENONnT experiment, the world's largest and most sensitive dark matter detector, has been built below the Gran Sasso mountain in Italy. Still, its data has to be quickly analyzed by scientists around the world. The Cornell High Energy Synchrotron Source (CHESS) routinely allows imaging the interior of new materials that scientists can test, analyze and perfect before turning them into technological innovations. These are just a few examples of advances in data-driven sciences that are in need for the NSDF,” said Pascucci, who also serves as the director of the Center for Extreme Data Management, Analysis and Visualization (CEDMAV).

Szalay, a distinguished professor of physics, astronomy and computer science at the Johns Hopkins University, said that scientific computing is constantly evolving. “Today, even mid-scale scientific instruments, like high throughput electron-microscopes, or arrays of small telescopes scanning the sky can generate many petabytes of data. These projects cannot afford to build their own vertical computing infrastructure but need a shared, high-throughput national data fabric to move their data from their instruments to where the compute intensive analyses can be carried out,” he said. “Our project will make it much easier to participate in this new model of scientific computing.”

According to Taufer, the NSDF can benefit society as a whole in critical fields from environmental, health and national security concerns, to renewable energy production, and, moreover, to understanding the evolution of galaxies and the nature of dark matter.

“By placing critically located data resources at minority-serving institutions and providing training and education for a diverse workforce, the NSDF pilot will enable diverse communities to participate in cutting-edge research,” said Taufer. “NSDF is partnering with the Minority Serving Cyberinfrastructure Consortium to implement a rich set of webinars, training modules, and other educational opportunities for underserved communities.”

According to Allison, director of the Center for PRedictive Integrated Structural Materials Science (PRISMS) at Michigan, “Harnessing materials data in all its forms is the next big scientific opportunity for the rapid discovery and design of the new materials needed to keep pace with societal needs. We are excited to be part of the NSDF team and have our U-M Materials Commons materials data platform as an integral part of this truly transformational data infrastructure. We are particularly energized by the plans to work with minority serving institutions, like the University of Texas El Paso, to demonstrate the potential for seamless re-use of materials data in teaching and research projects.”