COMPLECS: Data Transfer

Thursday, June 6, 2024

6:00 PM - 7:30 PM UTC

This event will be held remotely.

Whether analyzing experimental data collected from devices in the field on a laptop or generating simulated data from large-scale numerical calculations performed on high-performance computing (HPC) systems, how you move your data to where you need it, when you need it, is one of the most important aspects of creating your research workflows. And there are many ways to transfer data between the data storage and file systems you interact with. But which transfer method is right for you will depend on the answers to a few key questions about the data, namely: Where is the data located? How is the data organized? How much data is there? And where is the data going? 

In this first part of our series on Data Management, we introduce you to the essential concepts and command-line tools you should learn when you first begin transferring data to and from HPC (or any remote) systems regularly. You will learn how to check the integrity of your data after a transfer has been completed, how to utilize file compression, and how to choose the right data transfer tool for different situations. We also introduce you to the common data storage and file systems your data may encounter, their advantages and limitations, and how their different characteristics may affect data transfer performance on one end or the other. Additional topics about data transfer will be covered as time permits.

--- 
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes

Computational & Data Science Research Specialist High-Performance Computing User Services Group Data-Enabled Scientific Computing Division, SDSC

Marty Kandes a Computational and Data Science Research Specialist in the High-Performance Computing User Services Group at SDSC. He currently helps manage user support for Comet — SDSC’s largest supercomputer. Marty obtained his Ph.D. in Computational Science in 2015 from the Computational Science Research Center at San Diego State University, where his research focused on studying quantum systems in rotating frames of reference through the use of numerical simulation. He also holds an M.S. in Physics from San Diego State University and B.S. degrees in both Applied Mathematics and Physics from the University of Michigan, Ann Arbor. His current research interests include problems in Bayesian statistics, combinatorial optimization, nonlinear dynamical systems, and numerical partial differential equations.