Published July 15, 2022
By Cynthia Dillon, SDSC External Relations
Scientific research is trending rapidly toward more open, accessible and supportive rapid-response discoveries. At the same time, researchers are collaborating across the U.S. to address complex challenges, such as COVID-19 and supply chain issues.
Around the world there are robust responses to the need for a unified open research commons (ORC)—an interoperable collection of data and compute resources within both the public and private sectors that is user-friendly and broadly accessible. So, while other nations are gearing up for future competitiveness in this way, the U.S. is lagging behind, according to a group of researchers from several universities and institutes across the U.S. and in the Netherlands.
The problem, according to the researchers from places such as MIT, John Hopkins and Argonne National Lab, is that the U.S. needs a more committed effort toward making research computing and data infrastructure accessible and connected.
According to the researchers, including San Diego Supercomputer Center’s (SDSC) Christine Kirkpatrick, the lag compromises competitiveness and leadership, plus it limits beneficial U.S. contributions to global science.
“The U.S. has critical mass in experts, forward-thinking program officers and no end to the societal challenges and science use cases that call for a unified research commons, yet, it calls for organization at a level higher than these initiatives are usually funded. Immediate and sustained leadership and support in the U.S. are needed to chart the course, starting with policymakers and research funders,” said Kirkpatrick, division director of Research Data Services at SDSC, who leads SDSC’s FAIR (findable, accessible, interoperable and reusable) efforts via the U.S. GO FAIR Office located at SDSC.
In an article published in Science, the researchers affirm the value of broad cooperation around technology and data. For example, they point to shared governance and infrastructure, as well as standard agreements, that permit a shared system such as the North American electrical grid to direct electricity to where it is needed. They also cite the CIRRUS banking network, which can deliver funds from an individual’s bank account to most places around the world. The researchers note that similar coordination in the research enterprise could pay enormous dividends.
“We now have vast amounts of publicly available research data, but to fully leverage the potential power of these data beyond individual and often heroic efforts, these data need to be identified, made interoperable and aligned so that they can be broadly used by the scientific community,” said Philip Bourne, first author of the paper, currently with the University of Virginia’s School of Data Science, and formerly with SDSC as associate director of the RCSB Protein Data Bank, and with UC San Diego as a professor of pharmacology, and bioinformatics and systems biology.
According to the researchers, data on disparate topics—such as a county’s homelessness rates, average income, neighborhood food and health resources, air pollution, flood risk, predicted water resources and predicted average temperature—often are spread across a range of locations on the web, infrastructures and management regimes.
“If these data were integrated—brought together based on common data elements in each dataset—we could use these data for powerful analyses, like identifying locations with high homeless populations that are also likely to be hit hardest by floods, droughts or heat waves, or places with poor cardiac health that also have high or increasing particulate matter 2.5 (PM2.5) pollution, which could lead to more heart attacks,” noted Bourne.
Support by policymakers and funders who are driving the development of research infrastructure can facilitate such work, similar to the urgent cooperation seen among scientists during dire times of need, such as the COVID pandemic, the threat of war and the disruption to the global economy.
The researchers hold that the U.S. has a vibrant research ecosystem with no lack of computation and data resources. But, the struggle lies in the cultural and institutional obstacles that require policy leadership and a sustained commitment to overcome.
The approach needed per the researchers is a coherent national strategy that includes: mutually beneficial U.S. industry partnerships, formal executive representation in international ORC-focused initiatives, AI-ready data and long-term data preservation for reproducibility, professional data stewardship and ultimately federal commitment to charting the future and establishing a national ORC. According to the researchers, incentive to create a unified system is paramount.
“Scientists are not yet presented with the adequate incentives. Mandates from funders—such as data-sharing policies—help, but there are not enough definitions of requirements and rewards for complying or, indeed, a unification of what is expected of researchers regardless of the source of their research funding,” explained Kirkpatrick, who also serves as secretary general for the International Science Council's (ISC) Committee on Data (CODATA).
Kirkpatrick pointed to some of the efforts SDSC—a leader in high performance (HPC), data-intensive computing and cyberinfrastructure—has made toward supporting accessibility and connectedness:
About the collaborative Science article, Bourne noted, “It was wonderful to engage with Christine on this important policy forum and to reengage with SDSC where I spent many happy years. Collectively, we have made an important statement for the future of research computing, and I look forward to helping turn words into action.”
Share