Opportunities, benefits and challenges of sharing memory between CPUs and GPUs

In recent years, most performance increases in large scale computing have come from GPU technologies. However, many scientific applications cannot be fully ported to run completely on GPUs, requiring frequent data exchanges between the CPU and GPU components. The original, discrete GPU systems have separate memories for the CPU and the GPUs, requiring explicit data movement over a system bus, which is both expensive and tedious to program. This has effectively prevented a large fraction of scientific codes from making good use of such systems.

Recently, both NVIDIA and AMD have begun to offer datacenter-class systems that allow for a unified view of the memory address space between the CPU and GPU cores. From an application point of view, this promises to make using both CPU and GPU resources in a single application drastically more effective and much easier to program.

With several such systems becoming available to the scientific community, it is now a great time to learn about such systems and discuss their benefits, challenges, and drawbacks compared to discrete GPU systems. The architectural approach of NVIDIA and AMD is also significantly different, so understanding the advantages and disadvantages of the two will be useful in driving future scientific computing systems design and procurements.

This workshop aims to achieve several key objectives: (1) Offer a comprehensive overview of next-generation platforms focusing on unified shared memory between CPU and GPU cores. (2) Highlight application-driven performance analysis across diverse HPC systems. (3) Share early insights and optimization techniques for workloads on distinct platforms from various vendors.

Agenda Overview:

Lecture-style presentations of the SDSC AMD MI300A-based Cosmos and TACC NVIDIA Grace-Hopper Vista HPC systems, including architectural overview and early operations experience
User-provided experience short talks on the two systems.
A final panel focused on analyzing the benefits and drawbacks of the two systems compared to typical discrete GPU systems.

Call for proposals

We invite users with experience on any system that shares memory between CPU and GPU compute to present at the workshop, by submitting an abstract-only proposal.

The abstract must be between 100 and 300 words in length, plain text with no markup.
Please submit the abstract, along with a title and the list of authors and their affiliation, to smcg25@sdsc.edu for consideration.

All submissions within the scope of the workshop will be reviewed by the organizers. We will select 6 proposals to be presented as short user experience talks, and 5 representatives to participate in the panel discussion.

Deadline for submission is Friday, May 9 , 2025 Wednesday, May 21st, 2025.
Acceptance notifications will be sent out on Thursday, May 29th, 2025.

The workshop is organized by:

Igor Sfiligoi, University of California San Diego - San Diego Supercomputing Center (SDSC)
Mahidhar Tatineni, University of California San Diego - San Diego Supercomputing Center (SDSC)
John Cazes, Texas Advanced Computing Center (TACC)
Amit Ruhela. Texas Advanced Computing Center (TACC)
Dan Stanzione, Texas Advanced Computing Center (TACC)

----
About PEARC

The ACM Practice and Experience in Advanced Research Computing (PEARC) Conference Series is a community-driven effort built on the successes of the past, with the aim to grow and increase inclusivity by involving additional local, regional, national, and international cyberinfrastructure and research computing partners spanning academia, government, and industry. The ACM PEARC Conference Series is working to integrate and meet the collective interests of our growing community by providing a forum for discussing challenges, opportunities, and solutions among the broad range of participants in the research computing community. The ACM PEARC Conferences are organized by a group of dedicated volunteers from the community and are sponsored by the Association for Computing Machinery (ACM), the world’s largest educational and scientific computing society.

ACM PEARC25 will take place in Columbus, Ohio from July 20 – 24, 2025, and will explore the current practice and experience in advanced research computing, including workforce development, training, diversity, applications and software, and systems and software.

Questions?

Contact SDSC Events Coordinator

Event Website

PEARC25 Workshop: Opportunities, benefits and challenges of sharing memory between CPUs and GPUs

Call for proposals

Questions?