Published May 11, 2021
By Kimberly Mann Bruch, SDSC Communications
UC Irvine scientists recently used National Science Foundation (NSF) Extreme Science and Engineering Discovery Environment (XSEDE) allocations on Comet at the San Diego Supercomputer Center, located at UC San Diego, and Bridges at the Pittsburgh Supercomputing Center to better understand contributions from maternal and paternal lineages in genome sequences.
“While sequencing genomes has become a fundamental goal and tool in science, the problem has been that many genomes have been difficult to fully resolve by sequencing because they contain distinct contributions from maternal and paternal lineages,” said Brandon Gaut, an ecology and evolutionary biology professor at UC Irvine. “Our work used computer science optimization methods to help resolve the accuracy of separating maternal and paternal DNA in genomes.”
This novel research, which is detailed in a January 2021 BMC Bioinformatics journal article, not only leads to improvements in genome completeness, but also helps scientists better understand the genetic relationships between individuals, populations, and species. In turn, that may lead to improvements in medicine and food production for varying populations.
“Comet and Bridges were powerful enough to run our new genome sequence haplotype separation and optimization method called HapSolo,” said NSF Graduate Student Fellow Edwin Solares, first author of the journal article and also funded by the UC President’s Pre-Professoriate Fellowship. “With the help of the XSEDE allocations on supercomputers, we were able to illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape, a mosquito and the thorny skate.”
Solares explained that Comet and Bridges ran calculations for the genomes of the Chardonnay grape (Vitis vinifera) with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the thorny skate (Amblyraja radiata; 2650 Mb). “The use of supercomputers for these analyses cut our run time in half for several of our samples,” he said. “Being able to use XSEDE resources allowed us to focus on the science – instead of computational issues that are certain to arise without the use of supercomputers like Comet and Bridges.”
Solares is supported by an NSF Graduate Research Program Fellowship Grant (DGE-1321846), which supported his time to formulate and execute the study. Additional support came from the NSF (grant no. 1741627), NIH (grant nos. R01OD010974 and R01GM115562) and XSEDE awards (ACI-1548562, ACI-1445606 and TG-MCB180035).
About SDSC
The San Diego Supercomputer Center (SDSC) is a leader and pioneer in high-performance and data-intensive computing, providing cyberinfrastructure resources, services, and expertise to the national research community, academia, and industry. Located on the UC San Diego campus, SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from astrophysics and earth sciences to disease research and drug discovery. In December 2020 SDSC’s newest National Science Foundation-funded supercomputer, Expanse, entered production. At over twice the performance of Comet, Expanse supports SDSC’s theme of ‘Computing without Boundaries’ with a data-centric architecture, public cloud integration, and state-of-the art GPUs for incorporating experimental facilities and edge computing.
About PSC
The Pittsburgh Supercomputing Center (PSC) is a joint computational research center of Carnegie Mellon University and the University of Pittsburgh. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry and is a leading partner in XSEDE, the National Science Foundation cyber infrastructure program. PSC provides university, government and industrial researchers with access to several of the most powerful systems for high-performance computing, communications and data storage available to scientists and engineers nationwide for unclassified research. PSC advances the state of the art in high-performance computing, communications and data analytics and offers a flexible environment for solving the largest and most challenging problems in computational science.
Share