TSCC User Guide V2.0

  Acceptable Use Policy:  All users on the Triton Shared Computing Cluster and associated resources must agree to comply with the Acceptable Use Policy.

 

Last updated March 26, 2024

Technical Summary

The Triton Shared Computing Cluster (TSCC) is UC San Diego’s campus research High-Performance Computing (HPC) system. It is foremost a "condo cluster" (researcher-purchased computing hardware) that provides access, colocation, and management of a significant shared computing resource; while also serving as a "hotel" service for temporary or bursty HPC requirements.   Please see TSCC Condo page and TSCC Hotel page for more information.  

System Information

Hardware Specifications

 

Figure 1: Hardware overview of TSCC

Figure 1: Hardware architecture of TSCC

Figure 1 illustrates the conceptual hardware architecture of TSCC system. At its core, this system comprises of several condo servers ("Condo Cluster") and hotel servers ("Hotel Cluster") with servers connected through 25G switches. The cluster is managed by the SLURM scheduler, which orchestrates job distribution and execution. The system also features a Lustre Parallel File system with a capacity of 2PB and a home file system holding 500TB. The TSCC cluster has RDMA over Converged Ethernet (RoCE) for networking across the servers. The architecture is further complemented by dedicated login servers that serve as access points for users. All these components are integrated into the core switch fabric, ensuring smooth data flow and connectivity to both the campus network and the broader Internet.  

The servers in the condo and hotel clusters comprise of general computing servers with CPUs, and GPU servers with GPUs. The TSCC group will periodically update the hardware choices for general computing and GPU condo server purchases, to keep abreast of technological and cost advances. Please see the TSCC Condo page and TSCC Hotel page for specifications of the node types.    

System Access

Acceptable Use Policy:

All users of the Triton Shared Computing Cluster and associated resources must agree
to comply with the Acceptable Use Policy

Getting a trial account:

If you are part of a research group that doesn’t have an allocation yet on TSCC, and never has, and you want to use TSCC resources to run preliminary tests, you can apply for a free trial account. For a free trial, email tscc-support@ucsd.edu and provide your:  

  • Name  
  • Contact Information  
  • Department  
  • Academic Institution or Industry  
  • Affiliation (grad student, post-doc, faculty, etc.)  
  • Brief description of your research and any software applications you plan to use  

Trial accounts are 250 core-hours valid for 90 days.  

Joining the Condo Program

Under the TSCC condo program, researchers use equipment purchase funds to buy compute (CPU or GPU) nodes that will be operated as part of the cluster. Participating researchers may then have dedicated use of their purchased nodes, or they may run larger computing jobs by sharing idle nodes owned by other researchers. The main benefit is access to a much larger cluster than would typically be available to a single lab. For details on joining the Condo Program, please visit: Condo Program Details .  

Joining the Hotel Program

  Hotel computing provides flexibility to purchase time on compute resources without the necessity to buy a node. This pay–as–you–go model is convenient for researchers with temporary or bursty compute needs. For details on joining the Hotel Program, please visit: Hotel Program Details  

Back to top

Logging In

TSCC supports command line authentication using your UCSD AD password. To login to the TSCC, use the following hostname:  

login.tscc.sdsc.edu   

Following are examples of Secure Shell (ssh) commands that may be used to login to the TSCC:  

$ ssh <your_username>@ login.tscc.sdsc.edu  
$ ssh -l <your_username> login.tscc.sdsc.edu  

Then, type your AD password.  

You will be prompted for DUO 2-Step authentication. You’ll be shown these options:

  1. Enter a passcode or select one of the following options:
    1. Duo Push to XXX-XXX-1234
    2. SMS passcodes to XXX-XXX-1234
  2. I f you type 1 and then hit enter, a DUO access request will be sent to the device you set up for DUO access. Approve this request to finish the logging in process.  
  3. I f you type 2 and then hit enter, an SMS passcode will be sent to the device you set up for DUO access. Type this code in the terminal, and you should be good to go.  

For Windows Users, you can follow the exact same instructions using either PowerShell, Windows Subsystems for Linux (WSL) (a compatibility layer introduced by Microsoft that allows users to run a Linux environment natively on a Windows system without the need for a virtual machine or dual-boot setup), or terminal emulators such as Putty or MobaXterm. For more information on how to use Windows to access TSCC cluster, please contact the support team at tscc-support@ucsd.edu.

Set up multiplexing for TSCC Host

Multiplexing enables the transmission of multiple signals through a single line or connection. Within OpenSSH, this capability allows the utilization of an already established outgoing TCP connection for several simultaneous SSH sessions to a remote server. This approach bypasses the need to establish new TCP connections and authenticate again for each session. That is, you won't need to reauthenticate everytime you need to open a new terminal window for whatever reason.

Below you find instructions on how you can set it up for different OSs.

Linux or Mac:

In your local pc open or create this file: ~/.ssh/config, and add the following lines (use any text editor you like: vim, vi, vscode, nano, etc.):  

#TSCC Account  

Host tscc

HostName login.tscc.sdsc.edu  

User YOUR_USER_NAME  

ControlPath ~/.ssh/%r@%h:%p  
        ControlMaster auto  
        ControlPersist 10  

 

Make sure the permission of the created config file is 600 (i.e.: chmod 600 ~/.ssh/config) . With that configuration, the first connection to login.tscc.sdsc.edu will create a control socket in the directory ~ /.ssh/%r@%h:%p; then any subsequent connections, up to 10 by default as set by MaxSessions on the SSH server, will re-use that control path automatically as multiplexed sessions.    

While logging in you just need to type the following:  

$ ssh tscc

Note that in the previous line you won't have to type the whole remote host address since you already configured that in the ~/.ssh/config file previously (` Host tscc`). Then you're all set.

Windows:

If you’re using PuTTY UI to generate ssh connections from your local windows pc, you can set it up so it uses multiplexing.   To reuse connections in PuTTY, activate the "Share SSH connections if possible" feature found in the "SSH" configuration area. Begin by choosing the saved configuration for your cluster in PuTTY and hit "Load". Next, navigate to the "SSH" configuration category.  

  Putty1.png

 Check the “Share SSH connections if possible” checkbox. 

Putty2.png

Navigate back to the sessions screen by clicking on "Session" at the top, then click "Save" to preserve these settings for subsequent sessions. 

Putty3.png

 

Back to top

TSCC File System:

Home File System Overview:

For TSCC, the home directory is the primary location where the user-specific data and configuration files are stored. However, it has some limitations, and proper usage is essential for optimized performance.

Location and Environment Variable

  • After logging in, you'll find yourself in the /home directory.
  • This directory is also accessible via the environment variable $HOME.

Storage Limitations and Quota

  • The home directory comes with a storage quota of 100GB.
  • It is not meant for large data storage or high I/O operations.

What to Store in the Home Directory

  • You should only use the home directory for source code, binaries, and small input files.

What Not to Do

  • Avoid running jobs that perform intensive I/O operations in the home directory.
  • For jobs requiring high I/O throughput, it's better to use Lustre or local scratch space.

Parallel Lustre File System

Global parallel filesystem: TSCC features a 2 PB shared Lustre parallel file system from Data Direct Network (DDN) with a performance ranging up to 20GB/second . If your job requires high-performance I/O operations written in large blocks, it is advisable to use Lustre or local scratch space instead of the home directory. These are set up for higher performance and are more suitable for tasks requiring intensive read/write operations at a scale. Note that Lustre is not suitable for metadata intensive I/O involving a lot of small files or continuous small block writes. The node local NVMe scratch should be used for such I/O.

  • Lustre Scratch Location: /tscc/lustre/ddn/scratch/$USER

Note: Files older than 90 days, based on the creation date, are purged.

If your workflow requires extensive small I/O, contact user support at tscc-support@ucsd.edu to avoid putting undue load on the metadata server.

Node Local NVMe-based Scratch File System

All compute and GPU nodes in TSCC come with NVMe-based local scratch storage, but the sizes vary based on the node types. The range of memory space is 200GB to 2TB.
This NVMe-based storage is excellent for I/O-intensive workloads and can be beneficial for both small and large scratch files generated on a per-task basis. Users can access the SSDs only during job execution. The path to access the SSDs is /scratch/$USER/job_$SLURM_JOBID.

Note on Data Loss: Any data stored in /scratch/$USER/job_$SLURM_JOBID is automatically deleted after the job is completed, so remember to move any needed data to a more permanent storage location before the job ends.

Recommendations:

  • Use Lustre for high-throughput I/O but be mindful of the file and age limitations.
  • Utilize NVMe-based storage for I/O-intensive tasks but remember that the data is purged at the end of each job.
  • For any specialized I/O needs or issues, contact support for guidance.

For more information or queries, or contact our support team at tscc-support@ucsd.edu .

  Back to top

TSCC Software:

Installed and Supported Software

The TSCC runs Rocky Linux 9. Over 50 additional software applications and libraries are installed on the system, and system administrators regularly work with researchers to extend this set as time/costs allow. To check for currently available versions please use the command:

$ module avail

Singularity

Singularity is a platform to support users that have different environmental needs than what is provided by the resource or service provider.  Singularity leverages a workflow and security model that makes it a very reasonable candidate for shared or multi-tenant HPC resources like the TSCC cluster without requiring any modifications to the scheduler or system architecture. Additionally, all typical HPC functions can be leveraged within a Singularity container (e.g. InfiniBand, high performance file systems, GPUs, etc.). While Singularity supports MPI running in a hybrid model where you invoke MPI outside the container and it runs the MPI programs inside the container, we have not yet tested this.

Singlularity in the New environment

  • User running GPU-accelerated Singularitity containers with the older drivers will need to use the —nv switch.  The —nv switch will import the host system drivers and override the ones in the container, allowing users to run with the containers they’ve been using.
  • The Lustre filesystems will not automatically  be mounted within Singluarity containers at runtime.  Users will need to manually --bind mount them at runtime.

Example:

tscc ~]$ singularity shell --bind /tscc/lustre  ....../pytorch/pytorch-cpu.simg ......

Requesting Additional Software

Users can install software in their home directories. If interest is shared with other users, requested installations can become part of the core software repository. Please submit new software requests to tscc-support@ucsd.edu .

Environment Modules

TSCC uses the Environment Modules package to control user environment settings. Below is a brief discussion of its common usage. You can learn more at the Modules home page . The Environment Modules package provides for dynamic modification of a shell environment. Module commands set, change, or delete environment variables, typically in support of a particular application. They also let the user choose between different versions of the same software or different combinations of related codes.

Default Modules

Upgraded Software Stack

  • Compilers: gcc/11.2 gcc/10.2 gcc/8.5 intel 2019.1
  • MPI implementations: mvapich2/2.3.7 openmpi 4.1.3 intel-mpi/2019.10
  • Programs under software stack using GPU: cuda 11.2.2
  • Programs under software stack using CPU: fftw-3.3.10, gdal-3.3.3,geos-3.9.1

Upgraded environment modules

TSCC uses Lmod, a Lua-based environment module system. Under TSCC, not all the available modules will be displayed when running the module available command without loading a compiler. The new module command `$ module spider` is used to see if a particular package exists and can be loaded on the system. For additional details, and to identify dependents modules, use the command:

$ module spider <application_name>

The module paths are different for the CPU and GPU nodes. The paths can be enabled by loading the following modules:

$ module load cpu    #for CPU nodes

$ module load gpu    #for GPU nodes

Users are requested to ensure that both sets are not loaded at the same time in their build/run environment (use the module list command to check in an interactive session).

Useful Modules Commands

Here are some common module commands and their descriptions:

Command

Description

module list

List the modules that are currently loaded

module avail

List the modules that are available in environment

module spider

List of the modules and extensions currently available

module display <module_name>

Show the environment variables used by <module name> and how they are affected

module unload <module name>

Remove <module name> from the environment

module load <module name>

Load <module name> into the environment

module swap <module one> <module two>

Replace <module one> with <module two> in the environment

 Table 1: Module important commands

  Back to top

Loading and unloading modules

Some modules depend on others, so they may be loaded or unloaded as a consequence of another module command. If a model has dependencies, the command module spider <module_name> will provide additional details.

Compiling Codes

TSCC CPU nodes have GNU and Intel compilers available along with multiple MPI implementations (OpenMPI, MVAPICH2, and IntelMPI). Most of the applications on TSCC have been built using gcc/10.2.0. Users should evaluate their application for the best compiler and library selection. GNU and Intel compilers have flags to support Advanced Vector Extensions 2 (AVX2). Using AVX2, up to eight floating point operations can be executed per cycle per core, potentially doubling the performance relative to non-AVX2 processors running at the same clock speed. Note that AVX2 support is not enabled by default and compiler flags must be set as described below.

TSCC GPU nodes have GNU, Intel, and PGI compilers available along with multiple MPI implementations (OpenMPI, IntelMPI, and MVAPICH2). The gcc/10.2.0, Intel, and PGI compilers have specific flags for the Cascade Lake architecture. Users should evaluate their application for the best compiler and library selections.

Note that the login nodes are not the same as the GPU nodes, therefore all GPU codes must be compiled by requesting an interactive session on the GPU nodes.

Using the Intel Compilers:

The Intel compilers and the MVAPICH2 MPI compiler wrappers can be loaded by executing the following commands at the Linux prompt:

$ module load intel mvapich2

For servers with AMD processors - For AVX2 support, compile with the -march=-xHOST or -march= AVX512 option. Note that this flag alone does not enable aggressive optimization, so compilation with -O3 is also suggested.

Intel MKL libraries are available as part of the "intel" modules on TSCC. Once this module is loaded, the environment variable INTEL_MKLHOME points to the location of the mkl libraries. The MKL link advisor can be used to ascertain the link line (change the INTEL_MKLHOME aspect appropriately).

For example, to compile a C program statically linking 64-bit scalapack libraries on TSCC:

tscc]$ mpicc -o pdpttr.exe pdpttr.c \
    -I$INTEL_MKLHOME/mkl/include \
    ${INTEL_MKLHOME}/mkl/lib/intel64/libmkl_scalapack_lp64.a \
    -Wl,--start-group ${INTEL_MKLHOME}/mkl/lib/intel64/libmkl_intel_lp64.a \
    ${INTEL_MKLHOME}/mkl/lib/intel64/libmkl_core.a \
    ${INTEL_MKLHOME}/mkl/lib/intel64/libmkl_sequential.a \
    -Wl,--end-group ${INTEL_MKLHOME}/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a \
    -lpthread -lm
 

For more information on the Intel compilers: [ifort | icc | icpc] -help

 

Serial

MPI

OpenMP

MPI+OpenMP

Fortran

ifort

mpif90

ifort -qopenmp

mpif90 -qopenmp

C

icc

mpicc

icc -qopenmp

mpicc -qopenmp

C++

icpc

mpicxx

icpc -qopenmp

mpicxx –qopenmp

 

Using the PGI Compilers:

The PGI compilers are only available on the GPU nodes, and can be loaded by executing the following commands at the Linux prompt

$ module load pgi

 Note that the openmpi build is integrated into the PGI install so the above module load provides both PGI and openmpi.

For AVX support, compile with the `-fast` flag.

For more information on the PGI compilers: man [pgf90 | pgcc | pgCC]

 

Serial

MPI

OpenMP

MPI+OpenMP

Fortran

pgf90

mpif90

pgf90 -mp

mpif90 -mp

C

pgcc

mpicc

pgcc -mp

mpicc -mp

C++

pgCC

mpicxx

pgCC -mp

mpicxx -mp

 

Using the GNU Compiler:

The GNU compilers can be loaded by executing the following commands at the Linux prompt:

$ module load gcc openmpi
 

For AVX support, compile with -march=core-avx2. Note that AVX support is only available in version 4.7 or later, so it is necessary to explicitly load the gnu/4.9.2 module until such time that it becomes the default.

For more information on the GNU compilers: man [gfortran | gcc | g++]

 

Serial

MPI

OpenMP

MPI+OpenMP

Fortran

gfortran

mpif90

gfortran -fopenmp

mpif90 -fopenmp

C

gcc

mpicc

gcc -fopenmp

mpicc -fopenmp

C++

g++

mpicxx

g++ -fopenmp

mpicxx -fopenmp

Notes and Hints

  • The mpif90, mpicc, and mpicxx commands are actually wrappers that call the appropriate serial compilers and load the correct MPI libraries. While the same names are used for the Intel, PGI and GNU compilers, keep in mind that these are completely independent scripts.
  • If you use the PGI or GNU compilers or switch between compilers for different applications, make sure that you load the appropriate modules before running your executables.
  • When building OpenMP applications and moving between different compilers, one of the most common errors is to use the wrong flag to enable handling of OpenMP directives. Note that Intel, PGI, and GNU compilers use the -qopenmp, -mp, and -fopenmp flags, respectively.
  • Explicitly set the optimization level in your makefiles or compilation scripts. Most well written codes can safely use the highest optimization level (-O3), but many compilers set lower default levels (e.g. GNU compilers use the default -O0, which turns off all optimizations).
  • Turn off debugging, profiling, and bounds checking when building executables intended for production runs as these can seriously impact performance. These options are all disabled by default. The flag used for bounds checking is compiler dependent, but the debugging (-g) and profiling (-pg) flags tend to be the same for all major compilers.

 Running Jobs on TSCC

TSCC harnesses the power of the Simple Linux Utility for Resource Management (SLURM) to effectively manage resources and schedule job executions. To operate in batch mode, users employ the sbatch command to dispatch tasks to the compute nodes. Please note: it's imperative that heavy computational tasks are delegated exclusively to the compute nodes, avoiding the login nodes.

Before delving deeper into job operations, it's crucial for users to grasp foundational concepts such as Allocations, Partitions, Credit Provisioning, and the billing mechanisms for both Hotel and Condo models within TSCC. This segment of the guide offers a comprehensive introduction to these ideas, followed by a detailed exploration of job submission and processing.

Allocations

An allocation refers to a designated block of service units (SUs) that users can utilize to run tasks on the supercomputer cluster. Each job executed on TSCC requires a valid allocation, and there are two primary types of allocations: the Hotel and the Condo. In TSCC, SUs are measured in minutes.

TSCC Allocations Based on Purchase Type

The TSCC infrastructure offers allocations based on two distinct purchase types: the Condo Program and the Hotel Program. For the Condo Program, user purchases nodes which in turn grants them a Condo Allocation. In addition, any user can purchase time that avails them the Hotel Allocation. A user can have both condo and hotel allocation. However, the condo and hotel allocations are not interchangeable.

Hotel Allocation

Hotel Allocations are versatile as they can be credited to users at any point throughout the year, operating on a pay-as-you-go basis. A unique feature of this system is the credit rollover provision, where unused credits from one year seamlessly transition into the next.  

Condo Allocation

Every year, Condo users receive credit allocations on an annual basis for 5 years based on the number and type of server they have purchased. It's crucial to note that any unused Service Units (SUs) won't carry over to the next year. Credits are consistently allocated on the 4th Monday of each September.

The formula to determine the yearly Condo SU allocation is:

[Total cores of the node + (0.2 * Node's memory in GB) + (#A100s * 60) + (#RTX3090s * 20) +   (#A40s *10) + (#RTXA6000s * 30)] * 365 days * 24 hours * 60 minutes * 0.97 uptime

For example, suppose your group owns a 64-core node with 1024 GB in TSCC . The SUs added for the year to this node would be: [64 + (0.2 * 1024)] * 365 days * 24 hours * 60 minutes * 0.97 uptime = 328.8 * 509,382 = 137,042,841.6SUs in minutes for the calendar year.

The allocation time will be prorated on the first and fifth year based on when the server is added to the TSCC cluster.

Partitions

On a supercomputer cluster, partitions are essentially groups of nodes that are configured to meet certain requirements. They dictate what resources a job will use. In order to submit a job and get it running in the system, you need to keep in mind the specifications and limits of the partition you’re about to use. To get information about max wall time, allowed QOS, MaxCPUsPerNode, Nodes that said partition uses, etc., you can simply run the following command which will give you information about the partition you’re currently in:

 $ scontrol show partition

This will show something like:

 

The default walltime for all queues is now one hour. Max walltimes are still in force per the below list.

If you want to obtain information about all the partitions in the system, you can alternatively use the following command:

$ scontrol show partitions

Note the ‘s’ at the end of the command.

For additional information, some of the limits for certain partitions are provided for each partition in the table below. The allowed QOS must be specified for each partition.

Quality of Service (QOS):

QOS for each job submitted to Slurm affects job scheduling priority, job preemption and job limits. QOS available are:

  • hotel
  • hotel-gpu
  • Condo
  • hcg- <project-name>
  • hcp- <project-name>
  • condo-gpu
  • hca -<project-name>

In the previous list, <project-name> refers to the allocation id for the project. For TSCC (Slurm), that is the Account, or simply put, the group name of the user. 

How to Specify Partitions and QOS in your Job Script

You are required to specify which partition and QOS you'd like to use in your SLURM script (*.sb file) using the #SBATCH directives. Keep in mind the specificactions of each QOS for the different parftitions as shown in tables 2 and 3. Here's an example for a job script that requests one node from the hotel partition:

#!/bin/bash

#SBATCH --partition=hotel

#SBATCH --QOS=hotel

#SBATCH --nodes=1

# ... Other SLURM options ...

# Your job commands go here

CPU nodes 

Partition Name

Max Walltime

Allowed QOS

hotel

 7 days

hotel

gold

14 Days

condo, hcg- <project-name>

platinum

14 Days

condo, hcp- <project-name>

Table 2: CPU Partitions information. hcg = [H]PC [C]ondo [G]old, hcp = [H]PC [C]ondo [P]latinum

GPU nodes

Partition Name

Max Walltime

Allowed QOS

hotel_gpu

48 Hrs

hotel-gpu

a100

7 Days

condo-gpu, hca -<project-name>

rtx3090

 7 Days

condo-gpu, hca -<project-name>

a40

 7 Days

condo-gpu,hca-<project>

Table 3: GPU Partitions information. hca = [H]PC [C]ondo [A]ccelerator

Job Charging in Condo: 

For condo allocations, the charging of the jobs is based on the memory and the number of cores used by the job. The charging also varies based on the type of GPU used.

Job Charge:

( (Total # cores/job + (.2 * Total memory requested/job) + (Total # A100/job * 60) + (Total # A40/job * 10) + (Total # RTX6000s/job * 30) + (Total # RTX3090s/job * 20) ) * Job runtime (in seconds)/60


Example using Job Charge:

Let's assume a researcher wants to run a job that requires:

  • 16 cores
  • 32GB of requested memory
  • 1 A100 GPU
  • The job has a runtime of 120 minutes (or 2 hours), or 7200 seconds.

Calculation:

Given the formula, plug in the values:

Core Charge: 16 cores

Memory Charge: 0.2 * 32GB = 6.4, but we'll use 6 in this case given that SLURM will only take integers for this calculations.

A100 GPU Charge: 1 A100 GPU  * 60 = 60

Sum these charges: 16 + 6 + 60 = 82

Now, multiply by the job runtime:  82 * 7200 seconds / 60 = 9,840 SUs

Result:

The total cost for running the job would be 9,888 Service Units. Charging is always based on the resources that were used by the job.  The more resources you use and the longer your job runs, the more you'll be charged from your allocation. 

Job Charging in Hotel:

The formula used to calculate the job charging in hotel is as follows:

(Total # cores/job + (.2 * Total memory requested/job) + (Total #GPU/job * 30)) * Job runtime (in seconds)/60

Example using Job Charge:
Let's assume a researcher wants to run a job that requires:
  • 16 cores
  • 32GB of requested memory
  • 1 A100 GPU
  • The job has a runtime of 120 minutes (or 2 hours), or 7200 seconds.

Calculation:

Given the formula, plug in the values:
Core Charge: 16 cores
Memory requested: 0.2 * 32GB = 6.4, but we'll use 6 in this case given that SLURM will only take integers for this calculations. 
A100 GPU Charge: 1 A100 GPU * 30 = 30
Sum these charges: 16 + 6 + 30 = 52. Now, multiply by the job runtime:  52 * 7200 seconds / 60 = 6,240 SUs
Note that in hotel there is no differentiation on the type of the GPU used. All GPUs have the same flat charging factor of 30.

Running Interactive Jobs in TSCC

Using srun

You can use the srun command to request an interactive session. Here's how to tailor your interactive session based on different requirements:

Example: Requesting a Compute Node

To request one regular compute node with 1 core in the hotel partition for 30 minutes, use the following command:

$ srun --partition=hotel --pty --nodes=1 --ntasks-per-node=1 -t 00:30:00 -A xyz123 --qos=hotel --wait=0 --export=ALL /bin/bash

In this example:

  • --partition=hotel: Specifies the debug partition.
  • --pty: Allocates a pseudo-terminal.
  • --nodes=1: Requests one node.
  • --ntasks-per-node=1: Requests 1 tasks per node.
  • -t 00:30:00: Sets the time limit to 30 minutes.
  • -A xyz123: Specifies the account.
  • --wait=0: No waiting.
  • --export=ALL: Exports all environment variables.
  • --qos=hotel: Quality of Service.
  • /bin/bash: Opens a Bash shell upon successful allocation.

Submitting Batch Jobs Using sbatch

To submit batch jobs, you will use the sbatch command followed by your SLURM script. Here is how to submit a batch job:

$ sbatch mycode-slurm.sb

> Submitted batch job 8718049

In this example, mycode-slurm.sb is your SLURM script, and 8718049 is the job ID assigned to your submitted job.

In this section, we will delve into the required parameters for job scheduling in TSCC. Understanding these parameters is crucial for specifying job requirements and ensuring that your job utilizes the cluster resources efficiently. Required Scheduler Parameters

  • --partition (-p): Specifies which partition your job will be submitted to. For example, --partition=hotel would send your job to the hotel partition.
  • --qos (-q): Specifies which QOS your job will be used.
  • --nodes (-N): Defines the number of nodes you need for your job.
  • --ntasks-per-node OR --ntasks (-n): Indicates the number of tasks you wish to run per node or in total, respectively. If both are specified, SLURM will choose the value set for --ntasks.
  • --time (-t): Sets the maximum time your job is allowed to run, in the format of [hours]:[minutes]:[seconds].
  • --account (-A): Specifies the account to which the job should be charged.
  • --gpus: Indicates the total number of GPUs needed by your job.

Examples

Example 1: Submitting to 'hotel' Partition

Let's say you have a CPU-intensive job that you'd like to run on one node in the ‘hotel’ partition, and it will take no more than 2 hours. Your SLURM script may look like this:

 

#!/bin/bash

 

#SBATCH --partition=hotel

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --time=00:10:00

#SBATCH --account=your_account

#SBATCH --qos=hotel

 

# Load Python module (adjust as necessary for your setup)

module load python/3.8

 

# Execute simple Python Hello World script

python -c "print('Hello, World!')"

 

Example 2: Submitting to ‘hotel-gpu’ Partition

Suppose you have a job requiring 1 GPUs, shared access, and expected to run for 30 minutes. Here's how you could specify these requirements:

#!/bin/bash

#SBATCH --partition=hotel-gpu

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --time=00:10:00

#SBATCH --account=your_account

#SBATCH --gpus=1

#SBATCH --qos=hotel-gpu

# Load Python and TensorFlow modules

module load singularity

module load python/3.8

# Execute simple TensorFlow Python script

singularity exec -nv     tensorflow2.9.sif    python tensorflowtest.py

  

Where tensorflowtest.py can be a simple Hello world script such as:

import tensorflow as tf

print("TensorFlow version:", tf.__version__)

hello = tf.constant("Hello, TensorFlow!")

print(hello.numpy().decode())

 

Example 3: OpenMP Job

OpenMP (Open Multi-Processing) is an API that supports multi-platform shared-memory parallel programming in C, C++, and Fortran. It allows you to write programs that can run efficiently on multi-core processors. By using OpenMP directives, you can parallelize loops and other computational blocks to run simultaneously across multiple CPU cores, thereby improving performance and reducing execution time.

To run an OpenMP job in TSCC, you can use the following batch script as a template. This example is for a job that uses 16 CPU cores on a single node in the shared partition. The test script at the end of the template called ‘pi_openmp’ is located here: /cm/shared/examples/sdsc/tscc2/openmp. To run the bash script with the test script, copy the entire openmp directory to your own space, and from there execute the batch job: 

!/bin/bash
#SBATCH --job-name  openmp-slurm          #Optional, short for --job-name
#SBATCH --output slurm-%j.out-%N            # Standard output file
#SBATCH --output slurm-%j.err-%N        # Optional, for separating standard error
#SBATCH --partition hotel                                   # Partition name
#SBATCH --qos hotel                                          # QOS name
#SBATCH --nodes 1                                            # Number of nodes
#SBATCH --ntasks 1                                           # Total number of tasks
#SBATCH --cpus-per-task 8                                 # Number of CPU cores per task
#SBATCH --allocation <allocation>                       # Allocation name
#SBATCH --export=ALL                       #Optional, Export all environment variables
#SBATCH --time 01:00:00                                     # Walltime limit
#SBATCH --mail-type END                       #Optional, Send mail when the job ends
#SBATCH --mail-user <email>                #Optional, Send mail to this address

# GCC environment
module purge                                            # Purge all loaded modules
module load slurm                                       # Load the SLURM module
module load cpu                                         # Load the CPU module
module load gcc                                        # Load the GCC compiler module

# Set the number of OpenMP threads
export OMP_NUM_THREADS=8
# Run the OpenMP job
./pi_openmp

 

Breakdown of the Script

  • --job-name: Specifies the job name.
  • --output: Sets the output file where stdout and stderr are saved. SLURM merges stdout and stderr by default. See the above OpenMP job script example for separating stderr.
  • --partition: Chooses the partition.
  • --qos: Chooses the QOS.
  • --nodes, --ntasks, --cpus-per-task: Define the hardware resources needed.
  • --export: Exports all environment variables to the job's environment.
  • --time: Sets the maximum runtime for the job in hh:mm:ss format.
  • module load: Loads necessary modules for the environment.
  • export OMP_NUM_THREADS: Sets the number of threads that the OpenMP job will use.

Example Application Scripts:

Navigate to:

$ ls /cm/shared/examples/sdsc/tscc2/

abinit  amber  cp2k  cpmd  gamess  gaussian  gromacs  lammps  namd  nwchem  qchem  quantum-espresso  siesta  vasp openmp mpi

This directory holds test scripts for applications like abinit, vasp, and more, optimized for both GPU and CPU nodes.

Look for the testsh scripts or *.sb file within each folder. Although you can't execute any script directly from here, you can copy them—and the entire application directory—to your own space to run it. This will help you avoid dependency issues.

Important Note on submitting jobs

You always must specify the allocation name in your job request, whether it is Condo or Hotel partition, interactive or batch job. Please, add --account=<allocation_name> (-A=<allocation_name>) to your srun command, or #SBATCH --account=<allocation_name> (#SBATCH -A=<allocation_name>) to your job scripts. You can run the command " sacctmgr show assoc user=$USER format=account,user " to find out your allocation name.

$ sacctmgr show assoc user=$USER format=account,user

Account User
---------- ----------
account_1 user_1

Checking Job Status

You can monitor the status of your job using the squeue command:

$ squeue -u $USER

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

8718049 hotel mycode user PD 0:00 1 (Priority)

  • JOBID: This is the unique identification number assigned to each job when it is submitted. You will use this JOBID when you wish to cancel or check the status of a specific job.
  • PARTITION: This indicates which partition (or queue) the job is submitted to. Different partitions have different resources and policies, so choose one that fits the job's requirements.
  • NAME: This is the name of the job as specified when you submitted it. Naming your jobs meaningfully can help you keep track of them more easily.
  • USER: This field shows the username of the person who submitted the job. When you filter by $USER, it will display only the jobs that you have submitted.
  • ST: This stands for the status of the job. For example, "PD" means 'Pending,' "R" means 'Running,' and "C" means 'Completed.'
  • TIME: This shows the elapsed time since the job started running. For pending jobs, this will generally show as "0:00".
  • NODES: This indicates the number of hotel nodes allocated or to be allocated for the job.
  • NODELIST(REASON): This provides a list of nodes assigned to the job if it's running or the reason why the job is not currently running if it's in a pending state.

Once the job starts running, the status (ST) will change to R, indicating that the job is running:

$ squeue -u $USER

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

8718049 hotel mycode user R 0:02 1 tscc-14-01

Back to top

Canceling Jobs

To cancel a running or a queued job, use the scancel command:

$ scancel 8718049

Information on the Partitions

$ sinfo

Back to top

Checking Available Allocation

$ sacctmgr show assoc user=$USER format=account,user

Back to top

Managing Your User Account

On TSCC we have set up a client that provides important details regarding project availability and usage. The client script is located at:

/cm/shared/apps/sdsc/1.0/bin/tscc_client.sh

The script requires the `sdsc` module, which is loaded by default. However, you can simply run:

$ module load sdsc

$ which tscc_client

/cm/shared/apps/sdsc/1.0/bin/tscc_client

to ensure that the client is ready for use.

To start understanding the usage of the client, simply run:

$ tscc_client -h

Activating Modules:
1) slurm/tscc/23.02.7

Usage: /cm/shared/apps/sdsc/1.0/bin/tscc_client [[-A account] [-u user|-a] [-i] [-s YYYY-mm-dd] [-e YYYY-mm-dd]

Notice that you can either run `tscc_client` or `tscc_client.sh`.

To get information about balance and usage of an specific allocation, you can run:

$ tscc_client -A <account>

The code above should retrieve something like:

The 'Account/User' column shows all the users belonging to the Allocation. 'Usage' column shows how much SUs each user has used up until now. 'Allow' column shows how much SUs each user is allowed to use within the allocation. In the example above, each user is allowed to use the same amount except for `user_8` and `user_13`, given that there might be cases in which the allowed SUs per user can be different. 'User' column represents the total percentage used by user relative to their allowed usage ('Allow' column value). In this case, `user_13` has used 0.049% of the total 18005006 SUs available to him/her. 'Balance' represents the remainins SUs each user can still use. Given that `user_13` has used 8923 SUs of the total 18005006 SUs initially available to him/her, then `user_13` has still access to 18005006 SUs - 8923 SUs = 17996083 SU, which is exactly what is shown in the table for that user, in the Column 'Balance'.

Let's assume the case in which a user A submits a simple Job like the one we saw previously in the Hotel partition example:

Example using Job Charge:
  • 16 cores
  • 32GB of requested memory
  • 1 A100 GPU
  • The user has requested a walltime of 120 minutes (or 2 hours), or 7200 seconds.

From the example above, we know the Job will consume 6,240 SUs only if it uses the whole time of the walltime. When user A submits this job, and while the job is in Pending or Running state, the scheduler will reduce the total amount of 6,240 SUs from the total balance of the allocation, meaning that if the allocation initially had access to 10,000 SUs before user A submitted the job, right after the submission the allocation will only have an allowed balance of 10,000 SUs - 6,240 SUs = 3,760 SUs.

Let's say that user B from same allocation of user A wants to submit another job. If user B requests more resources that currently available, in this case 3,760 SUs, the job will automatically fail with an error like: `Unable to allocate resources: Invalid qos specification`. That is because at the moment of the submission of user B's job, there weren't enough resources available in the allocation.

However, rememer that the walltime requested by  a user when submitting a job doesn't necesarily force that the job uses the whole time to reach completion. It might be the case that user A's job only ran for 1h out of the 2h requested. That means that when the job finishes either because it fails or because it ends gracefully, the amount of used SUs by user A during that 1h is 3,120 SUs. That means that right after the job is done running, the allocation will have 10,000 SUs - 3,120 SUs = 6,880 SUs available for the rest of the users.

This simple example illustrates the usefulness of the client when users are trying to best use their available resources inside the same allocation, and give more insight about why some jobs are kept pending or fail.

The client also shows information by user:

$ tscc_client -u user1

tscc_client_2.png

When using the `-a` flag, the client will list all the users in the allocation you're currently part of. It is also possible to request by time range like this:
$ tscc_client -a -s 2024-01-01 -e 2024-06-30
tscc_client_3.png
That way you can filter how much has been used in a certain range of time. Finally, the `-i` flag should only show users with an Usage different from 0.

Find account(s) by description substring match

Let's say you want to filter results by partial substrings of account names. You can get useful information by running:

$ tscc_client -d account_1

tscc_client_4.png

Important Note:

Do not try to run for loops or include the client into a script that could result in multiple database invokations, such as:

#!/bin/bash

# Assuming user_ids.txt contains one user ID per line
# And you want to grep "Running Jobs" from the tscc_client command output

while read -r user_id; do
  echo "Checking jobs for user: $user_id"
  tscc_client some_command some_flags --user "$user_id" | grep "Running Jobs"
done < user_ids.txt

Doing this might bog down the system.

Users who try to run these kind of scripts or commands will have their account locked.

Commonly used commands in Slurm:

Below you can see a table with a handful useful commands that are often used to check the status of submitted jobs:

Action

Slurm command

Interactive Job submission

srun

Batch Job submission

sbatch jobscript

List user jobs and their nodes

squeue –u $USER

Job deletion

scancel <job-id>

Job status check

scontrol show job <job-id>

Node availability check

sinfo

Allocation availability check

sacctmgr

  Back to top

TSCC Usage Monitoring with SLURM

TSCC employs SBATCH and SLURM. Therefore, it's crucial to understand how to monitor usage. Here are the new commands to use on TSCC:

  1. Account Balances: On TSCC (SLURM):

$ sreport user top usage User=<user_name>

Please note that SLURM relies on a combination of sacctmgr and sreport for comprehensive usage details.

  1. Activity Summary:

$ sacct -u <user_name> --starttime=YYYY-MM-DD --endtime=YYYY-MM-DD

For group-level summaries on TSCC:

sreport cluster AccountUtilizationByUser start=YYYY-MM-DD end=YYYY-MM-DD format=Account,User,TotalCPU

(Note: Adjust the YYYY-MM-DD placeholders with the desired date range. This command will display account utilization by users within the specified period. Filter the output to view specific group or account details.)

  Back to top

Obtaining Support for TSCC Jobs

For any questions, please send email to tscc-support@ucsd.edu .

Back to top