Advanced Slurm jobs#

The following page describes how to use the srun command to run simple commands on the cluster, how to queue batches of jobs using sbatch, including how to manage these jobs and how to map

Controlling environment variables#

By default, your sbatch script and srun command will inherit environment variables set before you run sbatch/srun. This means that, among other things, modules loaded prior to running your job will remain loaded once your job starts running.

However, if you are running tools that are affected by environment variables, this may result in your script acting differently depending on in what context you run it, and making it hard to reproduce your analyses. To mitigate this, it is possible to control what environment variables are inherited via the --export option for sbatch/srun.

The recommended usage is --export=TMPDIR, which only inherits the TMPDIR variable from your terminal. The motivation for doing so is that we've configured it to point to the /scratch folder, ensuring that there is plenty capacity for whatever temporary files your programs might generate:

#!/bin/bash
#SBATCH --export=TMPDIR

module purge
module load samtools/1.20
samtools --version

Note that it is still recommended to run module purge, despite using --export, as sbatch will execute your bash startup scripts and those may still load modules automatically.

Running commands using srun#

The srun command can be used to queue and execute simple commands on the nodes, and for most part it should feel no different from running a command without Slurm. Simply prefix your command with srun and the queuing system takes care of running it on the first available node:

$ srun gzip chr20.fasta
../../_images/srun_minimal.gif

Except for the srun prefix, this is exactly as if you ran the gzip command on the head node. However, if you need to pipe output to a file or to another command, then you must wrap your commands in a bash (or similar) script:

$ srun bash my_script.sh
../../_images/srun_wrapped.gif

But at that point you might as well use sbatch with the --wait option if simply you want to be able to wait for your script to finish.

Cancelling srun#

To cancel a job running with srun, simply press Ctrl + c twice within 1 second:

$ srun gzip chr20.fasta
<ctrl+c> srun: interrupt (one more within 1 sec to abort)
srun: StepId=8717.0 task 0: running
<ctrl+c> srun: sending Ctrl-C to StepId=8717.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

See also the Cancelling jobs section on the Basic Slurm jobs page.

Monitoring your jobs#

Slurm offers a number of ways in which you may monitor your jobs:

  • The squeue command allows you to list jobs that have not yet finished (or failed). The recommended use is either squeue --me to show all your jobs or squeue --job ${JOB_ID}, where ${JOB_ID} is the ID of the job whose status you want to inspect.

  • The sacct command allows you list jobs that have finished running (or failed).

  • The --wait option makes sbatch wait until your job has completed before returning (similar to how srun works). This is for example useful if you want to queue and wait for jobs in a script.

  • In addition to actively monitoring your jobs, it is possible to receive email notifications when your jobs are started, finish, fail, are requeued, or some combination. This is accomplished by using the --mail-user and --mail-type options:

    $ sbatch my_script.sh --mail-user=abc123@ku.dk --mail-type=END,FAIL
    Submitted batch job 8503
    

    When run like this, Slurm will send a notification to abc123@ku.dk when the job is completed or fails. The possible options are NONE (the default), BEGIN, END, FAIL, REQUEUE, ALL, or some combination as shown above.

Warning

Remember to use your account UCPH email address as the recipient!

Monitoring processes in jobs#

In addition to monitoring jobs at a high level, it is possible to actively monitor the processes running in your jobs via (interactive) shells running on the same node as the job you wish to monitor. This is particularly useful to make sure that tasks make efficient use of the allocated resources.

In these examples we will use the htop command to monitor our jobs, but you can use basic top, a bash shell, or any other command you prefer, but see the warning below regarding GPU resources.

The first option for directly monitoring jobs is to request a job on the same server using the --nodelist option to specify the exact node you wish your job to monitor:

$ squeue --me
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 8503 standardq my_scrip   abc123  R       0:02      1 esrumcmpn03fl
$ srun --pty --nodelist esrumcmpn03fl htop

This requests an interactive shell on the node on which our job is running (esrumcmpn03fl) and starts the htop tool. This method requires that there are free resources on the node, but has the advantage that it does not impact your job.

Alternatively, you can make use of (overlap) the resources used by the job you wish to monitor, which means that you can perform your monitoring even if the node is completely booked. This is done using the --overlap and --jobid command-line options:

$ squeue --me
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 8503 standardq my_scrip   abc123  R       0:02      1 esrumcmpn03fl
$ srun --pty --overlap --jobid 8503 htop

The --jobid option takes as its argument the ID of the job we wish to monitor, which we can obtain using for example the squeue --me command (from the JOBID column).

Warning

It is not possible to use --overlap when you have reserved GPUs using the --gres option. This also means that you cannot monitor GPU resource usage in this manner, as other jobs on the same node cannot access already reserved GPUs. See the Monitoring GPU utilization section for instructions on how to monitor GPU utilization.

Monitoring the cluster#

The slurmboard utility is made available to make it easy to monitor activity on the cluster, for example to decide how many resources you can reasonably use for a job (see Best practice for reserving resources):

../../_images/slurmboard.png

Briefly, this utility displays every node in the cluster, their status, and available resources for each of these. The resources (CPUs, Memory, and GPUs) columns are colored as follows: Yellow indicates resources that have been reserved; green indicates resources that are actively being used; purple indicates resources that may be inaccessible due to other resources being reserved (e.g. RAM being inaccessible due to all CPUs being reserved vice versa); and black indicates resources that are unavailable due to nodes being offline or under maintenance.

Note

The Data Analytics Platform uses this utility to monitor how busy the cluster is and how job are performing. In particular, we may reach out to you if we notice that your jobs consistently use significantly fewer resources than the amount reserved, in order to optimize resource utilization on the cluster.

The slurmboard utility is available in the cbmr_shared project, and can be loaded as follows:

$ . /projects/cbmr_shared/apps/modules/activate.sh
$ module load slurmboard
$ slurmboard

Running multiple tasks using arrays#

As suggested by the name, the sbatch command is able to run jobs in batches. This is accomplished using "job arrays", which allows you to automatically queue and run the same command on multiple inputs.

For example, we could expand on the example above to gzip multiple chromosomes using a job array. To do so, we first need to update the script to make use of the SLURM_ARRAY_TASK_ID variable, which specifies the numerical ID of a task:

#!/bin/bash
#SBATCH --cpus-per-task=8
#SBATCH --time=60
#SBATCH --array=1-5%3

module load igzip/2.30.0
igzip --threads ${SLURM_CPUS_PER_TASK} "chr${SLURM_ARRAY_TASK_ID}.fasta"

The --array=1-5%3 option specifies that we want to run 5 tasks, numbered 1 to 5, each of which is assigned 8 CPUs and each of which is given 60 minutes to run. The %3 furthermore tells Slurm that at most 3 tasks can be run simultaneously (see below).

The above simply uses a contiguous range of job IDs, but it is also possible to specify a combination individual values (--array=1,2,3), ranges (--array=1-10,20-30), and more. See the sbatch manual page for a description of ways in which to specify lists or ranges of task IDs.

Note

Values used with --array must be in the range 0 to 1000.

Our script can then be run as before:

$ ls
chr1.fasta chr2.fasta chr3.fasta chr4.fasta chr5.fasta my_script.sh
$ sbatch my_script.sh
Submitted batch job 8504
$ squeue --me
 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
8504_1 standardq my_scrip   abc123  R       0:02      1 esrumcmpn01fl
8504_2 standardq my_scrip   abc123  R       0:02      1 esrumcmpn01fl
8504_3 standardq my_scrip   abc123  R       0:02      1 esrumcmpn01fl
8504_4 standardq my_scrip   abc123  R       0:02      1 esrumcmpn01fl
8504_5 standardq my_scrip   abc123  R       0:02      1 esrumcmpn01fl
$ ls
chr1.fasta.gz  chr4.fasta.gz  slurm-8507_1.out  slurm-8507_4.out
chr2.fasta.gz  chr5.fasta.gz  slurm-8507_2.out  slurm-8507_5.out
chr3.fasta.gz  my_script.sh   slurm-8507_3.out

Unlike a normal sbatch command, where Slurm creates a single .out file, an sbatch --array command will create an .out file is for each task in the array.

In this example there was a simple one-to-one mapping between the SLURM_ARRAY_TASK_ID and our data, but that is not always the case. The Mapping task IDs to data section below describes several ways you might use to map the SLURM_ARRAY_TASK_ID variable to more complex data/filenames.

Limiting simultaneous jobs#

By default Slurm will attempt to run every job in an array at the same time, provided that there are resources available. Since Esrum is a shared resource we ask that you consider how much of the cluster you'll be using and limit the number of simultaneous jobs to a reasonable number.

Limiting the number of simultaneous jobs is done by appending a % and a number at the end of the --array value as shown above. For example, in the following script we queue a job array containing 100 jobs, each requesting 8 CPUs. However, the %16 appended to the --array ensures that at most 16 of these jobs are running at the same time:

#!/bin/bash
#SBATCH --cpus-per-task=8
#SBATCH --array=1-100%16

This ensures that we use no more than 1 compute node's worth of CPUs (128 CPUs per node) and thereby leave plenty of capacity available for other users.

In addition to limiting the number of simultaneously running jobs, you can also give your jobs a lower priority using the --nice option:

#SBATCH --nice

This ensures that other users' jobs, if any, will be run before jobs in your array and thereby prevent your job array from always using the maximum number of resources possible. Combined with a reasonable % limit this allows you to run more jobs simultaneously, than if you just used a % limit, without negatively impacting other users.

Please reach out if you are in doubt about how many jobs you can run at the same time.

Managing job arrays#

Job arrays can either be cancelled as a whole or in part. To cancel the entire job (all tasks in the array) simply use the primary job ID before the underscore/dot:

$ scancel 8504

To cancel part of a batch job/array, instead specify the ID of the sub-task after the ID of the batch job, using a dot (.) to separate the two IDs instead of an underscore (_):

$ scancel 8504.1

Warning

While it is possible to use sbatch with jobs of any size, it should be remembered that Slurm imposes some overhead on jobs. It is therefore preferable to run jobs in batches, instead of running each task individually.

Mapping task IDs to data#

Using sbatch arrays requires that you map a number (the array task ID) to a filename or similar. The above example assumed that filenames were numbered, but that is not always the case.

The following describes a few ways in which you can map array task ID to filenames in a bash script.

  1. Using numbered filenames:

    The example showed how to handle filenames where the numbers were simply written as 1, 2, etc.:

    # Simple numbering: sample1.vcf, sample2.vcf, etc.
    FILENAME="sample${SLURM_ARRAY_TASK_ID}.vcf"
    

    However, it is also possible to format numbers in a more complicated manner (e.g. 001, 002, etc.), using for example the printf command:

    # Formatted numbering: sample001.vcf, sample002.vcf, etc.
    FILENAME=$(printf "sample%03i.vcf" ${SLURM_ARRAY_TASK_ID})
    

    See above for an example script and the expected output.

  2. Using a table of filenames:

    Given a text file my_samples.txt containing one filename per line:

    /path/to/first_sample.vcf
    /path/to/second_sample.vcf
    /path/to/third_sample.vcf
    
    # Prints the Nth line
    FILENAME=$(sed "${SLURM_ARRAY_TASK_ID}q;d" my_samples.txt)
    

    A sbatch script could look as follows:

    #!/bin/bash
    #SBATCH --array=1-3
    
    FILENAME=$(sed "${SLURM_ARRAY_TASK_ID}q;d" my_samples.txt)
    
    module load htslib/1.18
    bgzip "${FILENAME}"
    
  3. Using a table of numbered samples (my_samples.tsv):

    ID

    Name

    Path

    1

    first

    /path/to/first_sample.vcf

    2

    second

    /path/to/second_sample.vcf

    3

    third

    /path/to/third_sample.vcf

    # Find row where 1. column matches SLURM_ARRAY_TASK_ID and print 3. column
    FILENAME=$(awk -v ID=${SLURM_ARRAY_TASK_ID} '$1 == ID {print $3; exit}' my_samples.tsv)
    

    By default awk will split columns by any whitespace, but if you have a tab separated file (.tsv) file it is worthwhile to specify this using the FS (field separator) option:

    # Find row where 1. column matches SLURM_ARRAY_TASK_ID and print 3. column
    FILENAME=$(awk -v FS="\t" -v ID=${SLURM_ARRAY_TASK_ID} '$1 == ID {print $3; exit}' my_samples.tsv)
    

    This ensures that awk returns the correct cell even if other cells contain whitespace.

    A sbatch script could look as follows:

    #!/bin/bash
    #SBATCH --array=1-3
    
    # Grab second column where the first column equals SLURM_ARRAY_TASK_ID
    NAME=$(awk -v FS="\t" -v ID=${SLURM_ARRAY_TASK_ID} '$1 == ID {print $2; exit}' my_samples.tsv)
    # Grab third column where the first column equals SLURM_ARRAY_TASK_ID
    FILENAME=$(awk -v FS="\t" -v ID=${SLURM_ARRAY_TASK_ID} '$1 == ID {print $3; exit}' my_samples.tsv)
    
    module load htslib/1.18
    echo "Now processing sample '${NAME}'"
    bgzip "${FILENAME}"
    

Additional resources#