Using the GPU / high-memory nodes#
This page describes how to schedule tasks on the dedicated GPU nodes and on the combined GPU / high-memory node. The cluster currently includes 1 node with 2x A100 Nvidia GPUs and 4 TB of RAM, and 2 nodes with 2x H100 Nvidia and 2 TB of RAM.
These nodes are intended for tasks that can make use of GPUs, and for tasks that require more than the 2 TB of RAM available on regular compute nodes.
Running jobs on the GPU / high-memory node#
By default, jobs submitted via Slurm will only run on regular nodes,
even if you ask for more than 2 TB of RAM or ask for a GPU. Attempting
to run such a task will instead result in a Requested node
configuration is not available
error message.
This is because the GPU / high-memory node is located on its own queue,
to in order prevent normal use of the cluster from blocking access to
these resources. You must therefore select use the option
--partition=gpuqueue
to select the correct queue. This might look as
follows in a sbatch script:
#!/bin/bash
#SBATCH --partition=gpuqueue
my-memory-hungry-command
While running on the GPU queue, you can reserve up to 3920 GB of RAM and up to two GPUs (see below) per job. The GPU / high-memory nodes otherwise use the same defaults as the other nodes (~16 GB of RAM per CPU reserved).
For example, to run a job using 2.5 TB of RAM on the GPU / high-memory node:
#!/bin/bash
#SBATCH --partition=gpuqueue
#SBATCH --mem 2560G
my-memory-hungry-command
This script can then be submitted as usual:
$ sbatch my_hi_mem_job.sh
Submitted batch job 217217
See the Basic Slurm jobs and Advanced Slurm jobs pages for information about reserving additional CPUs, more RAM, and for setting other Slurm settings for your jobs.
We ask that you do not reserve all available CPUs or all RAM on the GPU / high-memory node, unless it is actually required for your analyses, since leaving some unused resources permits other users to utilize the GPUs while your tasks are running.
Reserving GPUs#
Requesting GPUs is done with the --gres
option and also requires
that using the --partition=gpuqueue
option to select the correct
queue, as described above. This might look as follows in a sbatch
script:
#!/bin/bash
#SBATCH --partition=gpuqueue --gres=gpu:1
nvidia-smi -L
The --gres=gpu:1
in the above asks Slurm to make 1 GPU available to
our job. This can be increased to 2
to reserve both GPUs on the
node, but because of the limited number of GPUs we ask that you only
reserve 1 GPU per job, which is normally also more efficient.
This script can then be submitted as usual:
$ sbatch my_gpu_job.sh
Submitted batch job 217218
$ cat slurm-217218.out
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-4f2ff8df-0d18-a99b-9fb8-67aa0867f7a3)
Requesting specific GPUs#
As indicated above, the GPU nodes includes both Nvidia H100 and A100 GPUs. By default, your job will be assigned to the first idle GPU(s), but it is also possible to request a specific GPU type.
To request an A100 GPU, replace the --gres=gpu:1
option with
--gres=gpu:a100:1
, and to request an H100 GPU, replace the
--gres=gpu:1
option with --gres=gpu:h100:1
. For example,
#!/bin/bash
#SBATCH --partition=gpuqueue --gres=gpu:h100:1
nvidia-smi -L
This script can then be submitted as usual:
$ sbatch my_h100_job.sh
Submitted batch job 217219
$ cat slurm-217219.out
GPU 0: NVIDIA H100 NVL (UUID: GPU-c43d0655-2d15-7e66-90b3-9b732a1d13ba)
We recommend looking at current GPU utilization before submitting your
job, as any time saved by running on a faster (H100) GPU may be lost
from having to wait for them to be idle. See slurmboard
utility
described in the Monitoring the cluster section provides a simple way
to see GPU reservations.
Running an interactive session#
While it is possible to run an interactive session on the GPU /
high-memory node, we ask that you limit the usage of such sessions as
much as possible. If at all possible, prefer using sbatch
or
non-interactive srun
instead. This ensures that the resources are
available for use when you (or other users) are not actively using them.
To start an interactive session using a GPU you simply apply the same
--partition
and (optionally) the same --gres
options as above if
you need a GPU, as well as other resource options described in the
Reserving resources section:
$ srun --pty --partition=gpuqueue -- /bin/bash
See the Interactive sessions section for information about interactive sessions, including information about running programs with graphical interfaces.
Warning
Interactive sessions left running on the GPU node may be terminated without warning.
Monitoring GPU utilization#
It is highly recommended to monitor GPU utilization when you run jobs on the GPU node: To make full use of the hardware you want to keep GPU utilization at 100% and to do so you typically want to load as much data into GPU memory as possible. The exact way in which you can accomplish this depends on the software you are running, but can often be accomplished by increasing the size of the batches you are processing.
The way in which you are using the GPUs will affect how you can monitor them, depending on whether you have reserved a GPU for an interactive session:
Monitoring an interactive session#
If you are running a job in an interactive session, then you can monitor
the reserved GPU(s) directly using the nvidia-smi
command:
$ nvidia-smi -l 5
Thu Apr 4 14:30:46 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:27:00.0 Off | 0 |
| N/A 57C P0 307W / 300W | 52357MiB / 81920MiB | 99% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe On | 00000000:A3:00.0 Off | 0 |
| N/A 56C P0 298W / 300W | 58893MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2807877 C dorado 52344MiB |
| 1 N/A N/A 2807849 C dorado 58880MiB |
+---------------------------------------------------------------------------------------+
This will print resource usage for the GPUs you have reserved for your
interactive session (and only for those GPUs), and continue to print it
every 5 seconds afterwards via the -l 5
option. Other monitoring
tools are available (for example gpustat
), but are outside the scope
of this documentation.
Monitoring a Slurm job#
If you have started a standard (non-interactive) job via Slurm, then you
will not be able to directly run nvidia-smi
nor will you be able to
join the running job using srun -j
due to the way Slurm handles
special resources. We have therefore set up log-files on the GPU nodes
node that contains the output from the nvidia-smi
command as shown
above.
To watch the content of this log-file, firstly determine the job ID of your job running on the GPU node:
$ squeue --me --partition=gpuqueue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
570316 gpuqueue bash abc123 R 13:55 1 esrumgpun01fl
Then we use srun
with the --overlap
option to run a command
inside this job, which we specify using the --jobid 570316
option.
The --gres=none
option is required, since otherwise Slurm would try
to reserve the GPU our job already uses and eventually time out.
$ srun --overlap --jobid 570316 --gres=none --pty -- watch -n 15 -d cat /scratch/gpus/nvidia-smi.txt
Warning
Remember to replace the 570316
with the ID of your job!
This prints the contents of the log-file every 15 seconds (which is how
often the files are updated) and optionally highlights the changes since
the last nvidia-smi
run. To disable the highlighting, simply remove
the -d
option from the command.
This command does not take up additional resources on the GPU node and will automatically exit when your job finishes. See the Monitoring processes in jobs for more information.
Troubleshooting#
Error: Requested node configuration is not available#
See the Slurm Basics Troubleshooting section.