NRP is a heterogenous, nationally distributed, open system that features CPUs, FP32- and FP64-optimized GPUs, and FPGAs, arranged into two types of subsystems ("high-performance" at SDSC and two "FP32-optimized" at UNL, MGHPCC), specialized for a wide range of data science, simulations, and machine learning or artificial intelligence, allowing data access through a federated national-scale content delivery network (CDN). The NRP HPC system features a novel extremely low-latency fabric from GigaIO that allows dynamic composition of hardware including FPGAs, GPUs, and NVMe storage. Each of the three sites (SDSC, UNL, and MGHPCC) includes ~1 PB of useable disk space. The three storage systems function as data origins of a content delivery network (CDN) that will provide data access anywhere in the country within an RTT of ~10ms via use of network caches. NRP's data infrastructure supports a national "Bring Your Own Resource" (BYOR) program for campuses to add additional compute, data, and storage resources to NRP. The system also can be scaled out via "Bring Your Own Device" (BYOD) programs.
System Component | Configuration |
---|---|
NVIDIA A100 HGX GPU Servers | |
HGX A100 servers | 8 |
NVIDIA GPUs/server | 8 |
HBM2 Memory per GPU | 80 GB |
Host CPU (2 per server) | AMD EPYC 7742 |
Host CPU memory (per server) | 512 GB @ 3200 MHz |
FabreX Gen4 Network Adapter (per server) | 8 |
Solid State Disk (2 per server) | 1 TB |
Xilinx Alveo FPGA Servers | |
GigaIO Gen4 Pooling Appliance | 4 |
FPGAs/appliance | 8 |
FPGA | Alveo U55C |
High Core-count CPU Servers | |
Number of servers | 2 |
Processor (2 per server) | AMD EPYC 7742 |
Memory (per server) | 1TB @ 3200 MHz |
FabreX Network Adapter (per server) | 1 |
Mellanox Connect-X6 Network Adapter (per server) | 1 |
Low Core-count CPU Servers | |
Number of servers | 2 |
Processors (2 per server) | AMD EPYC 7F72 |
Memory (per server) | 1TB @ 3200 MHz |
FabreX Network Adapter (per server) | 1 |
Mellanox Connect-X6 Network Adapter (per server) | 1 |
Network Infrastructure | |
GigaIO FabreX 24port Gen4 PCIe switches | 18 |
GigaIO FabreX Network Adapters | 36 |
Mellanox Connect-X6 Network Adapters | 10 |
FabreX-connected NVMe Resource | |
GigaIO Gen3 NVMe Pooling Appliance for NVMe resource | 4 |
Capacity per NVMe resource | 122 TB |
Ancillary Systems | |
Home File system | 1.6 PB |
Service Nodes | 2 |
Data Cache (8) | 50 TB each |
System Component | Configuration |
---|---|
NVIDIA A10 GPUs (One Each at UNL and MGHPCC) | |
GPU servers | 18 |
NVIDIA A10 GPUs/node | 8 |
Host CPU (2 per server) | AMD EPYC 7502 |
Host CPU memory | 512 GB @ 3200 MHz |
Node-local NVMe | 8 TB |
Network adapters | 1x1Gbps; 2x10Gbps |
Ancillary Systems | |
Service Nodes (2 per site) | AMD EPYC 7402P |
Home File System | 1.6 PB |