Jupyter Notebooks¶
Jupyter Notebooks are available via the module system on Esrum and can be run on regular compute nodes or on the GPU/high-memory node, depending on the kind of analyses you wish to run and the size of your workload.
By default, Jupyter only includes support for Python notebooks, but instructions are included below for how to add support for R.
Note
We are currently working on making Jupyter available along with the Troubleshooting. We will announce when this service is ready.
Starting a Jupyter notebook¶
To start a notebook on a node, run the following commands:
$ module load jupyter-notebook
$ srun --pty -- jupyter notebook --no-browser --ip=0.0.0.0 --port=XXXXX
The number used in the argument --port=XXXXX
must be a value in the
range 49152 to 65535, and must not be a number used by another user on
Esrum. The number shown here was randomly selected for you, and you can
refresh this page for a different suggestion.
This will allocate a single CPU and ~16 GB of RAM to your notebook. If
you need additional resources for your notebook, then please see the
Reserving resources section for instructions on how to reserve
additional CPUs and RAM, and the GPU / high-memory jobs page for
instructions on how to reserve GPUs or large amounts of memory. The
srun
accepts the same options as sbatch
.
Tip
It is recommended that you execute the srun
command in a tmux
or screen
session, to avoid the notebook shutting down if you
lose connection to the head node. See Persistent sessions with tmux for more
information.
Connecting to the Jupyter Notebook¶
To connect to the notebook server, you will first need to set up a connection from your PC to the compute node on which your notebook is running. This is called "port forwarding" and is described on the Port forwarding page.
However, to do so you must first determine on which compute node your job is running. This can be done in a couple of ways:
Look for the URLs printed by Jupyter when you started it on Esrum:
To access the notebook, open this file in a browser: file:///home/abc123/.local/share/jupyter/runtime/nbserver-2082873-open.html Or copy and paste one of these URLs: http://esrumcmpn07fl.unicph.domain:XXXXX/?token=0123456789abcdefghijklmnopqrstuvwxyz or http://127.0.0.1:XXXXX/?token=0123456789abcdefghijklmnopqrstuvwxyz
In this example, our notebook is running on the
esrumcmpn07fl
node. All Esrum node names end with.unicph.domain
, but we do not need to include this part of the name.Alternatively, run the following command on the head node in a separate terminal:
$ squeue --me --name jupyter JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 551600 standardq jupyter abc123 R 8:49 1 esrumcmpn07fl
By looking in the
NODELIST
column, we can see that the notebook is running onesrumcmpn07fl
, as above.
Once you've determined what node your notebook is running on, go to the
Port forwarding page and setup port forwarding to that node and
the port you used when starting Jupyter (e.g. XXXXX
).
Adding an R kernel to Jupyter¶
The Jupyter module only comes with a Python kernel. If you instead wish to use R in your Jupyter notebook, you can add an R Kernel for the specific version of R that you wish to use.
To do so, run the following commands, replacing R/4.3.3 with the version of R that you wish to use:
$ module load jupyter-notebook/6.5.4
$ module load --auto R/4.3.3
$ R
> install.packages('IRkernel')
> name <- paste("ir", gsub("\\.", "", getRversion()), sep="")
> displayname <- paste("R", getRversion())
> IRkernel::installspec(name=name, displayname=displayname)
> quit(save="no")
This will make an R kernel with the name R 4.3.3
available in
Jupyter. You can repeat these commands for each version of R that you
wish to make available as a kernel. Run the command module purge
between each, to ensure that you have loaded only the expected version
of R and gcc
that R depends on.
Once you are done adding R versions, you start notebook as shown above:
$ module load jupyter-notebook/6.5.4
$ srun --pty -- jupyter notebook --no-browser --port=XXXXX
While you do not need to load the R module first, if you only wish to run R code, you must do so if you wish to install R libraries via the notebook:
$ module load jupyter-notebook/6.5.4
$ module load --auto R/4.3.3
$ srun --pty -- jupyter notebook --no-browser --port=XXXXX
Running Slurm jobs from Jupyter notebooks¶
We provide a Python module (jupyter_slurm) for submitting Slurm jobs from Jupyter notebooks. This allows you to perform computationally expensive analyses in a notebook, potentially across multiple nodes, without having to reserve the resources required for this for the duration of your notebook.
Installing jupyter_slurm
¶
To use this module, you need to either install it together with Jupyter, or you can "inject" it into your notebook. The former option is recommended, and also allows you to install other libraries that you need.
Option A: Installing the module with Jupyter¶
Deactivate any currently active conda and python environments
# to deactivate Conda environments: conda deactivate # to deactivate Python environments: deactivate
Load the python version you wish to use
module load python/3.11.3
Create a virtual environment for jupyter / jupyter_slurm. The name jupyter_slurm may be replaced by any name that you prefer
python3 -m venv jupyter_slurm
Install jupyter in the environment. You can install either the latest version or, if you prefer, a specific version of jupyter notebook:
./jupyter_slurm/bin/pip install notebook # the latest version, or ./jupyter_slurm/bin/pip install notebook==7.4.5 # a specific version
Install any other python modules you need in the same manner.
Install jupyter_slurm in the environment
# install the latest version of the module ./jupyter_slurm/bin/pip install /projects/cbmr_shared/apps/dap/jupyter_slurm/latest # or, alternatively, a specific version # ./jupyter_slurm/bin/pip install /projects/cbmr_shared/apps/dap/jupyter_slurm/0.0.1
To start the notebook, run, replacing XYZ
with the port number you
are using (see above for more information)
shell srun --pty -- ./jupyter_slurm/bin/jupyter notebook --no-browser --ip=0.0.0.0 --port=XYZ
You can now import and use the jupyter_slurm
module as described
below.
Option B: "Injecting" the module into your notebook¶
This method is not recommended, but allows you make use of
jupyter_slurm
if you are using the jupyter
environment module on
Esrum, or another version of Jupyter where you cannot install your own
python modules.
Instead of installing the module, we add it to Python's sys.path
list as shown below. This list defines where Python looks when importing
modules and this code therefore has to be run before attempting to
import the module:
import sys
# to load the latest version
sys.path.append("/projects/cbmr_shared/apps/dap/jupyter_slurm/latest/src")
# or, alternatively, to load a specific version
# sys.path.append("/projects/cbmr_shared/apps/dap/jupyter_slurm/0.0.1/src")
You can now import / use jupyter_slurm
as described below.
Running Slurm jobs using jupyter_slurm
¶
The jupyter_slurm
provides wrapper functions for sbatch
and for
srun
. For example, to queue a job using sbatch
, use the function
with the same name:
import jupyter_slurm as jp
jobid = jp.sbatch(
[
["samtools", "markdup", "my data.sam", "--output", "my data.markdup.bam"],
["samtools", "index", "my data.markdup.bam"],
],
modules=["samtools"],
)
print("Started job with ID", jobid)
This generates an sbatch
script in which the samtools
module is
loaded, and then runs the two samtools
commands. The Job ID for this
job is returned the function. See below for how to pass shell commands
to the script.
Note
You can see the script that the sbatch
function generates by
calling the sbatch_script
function instead. The two functions
take the same arguments, but sbatch_script
returns a list of
lines in the resulting script.
import jupyter_slurm as jp
result = jp.srun(
["samtools", "idxstats", "my data.markdup.bam"],
modules=["samtools"],
capture=True,
)
print("Command ", ("failed" if result else "completed"), " with return code", result.returncode)
print(" STDOUT =", result.stdout)
print(" STDERR =", result.stderr)
Writing shell commands for the sbatch
and srun
functions¶
In the above examples, shell commands have been specified as lists of strings:
[
["samtools", "markdup", "my data.sam", "--output", "my data.markdup.bam"],
["samtools", "index", "my data.markdup.bam"],
]
This has the advantage that jupyter_slurm
can automatically escape
special characters such as spaces for you. This ensures that your
commands work regardless of what your filenames look like.
Alternatively, you can pass shell commands as strings, but in that case you must manually quote/escape special characters:
[
"samtools markdup 'my data.sam' --output 'my data.markdup.bam'",
"samtools index 'my data.markdup.bam'",
]
This is equivalent to what gets generated automatically when passing arguments as lists of strings.
Function reference¶
sbatch
function¶
def sbatch(commands: Sequence[str] | Sequence[Sequence[str]],
*,
cpus: int = 1,
gpus: int = 0,
gpu_type: Literal["a100", "h100", "A100", "H100"] | None = None,
memory: int | str | None = None,
job_name: str | None = None,
modules: SequenceNotStr[str] = (),
extra_args: SequenceNotStr[str] = (),
output_file: str | Path | None = None,
array_params: str | None = None,
wait: bool = False,
mail_user: str | bool = False,
strict: bool = True) -> int
Submit an sbatch script for running one or more commands.
Arguments:
commands
- One or more commands to be run using sbatch. May be a list of strings, in which case the strings are assumed to be properly formatted commands and included as is, or a list of list of strings, in which case the each list of strings is assumed to represent a single command, and each argument is quoted/escaped to ensure that special characters are properly handled.cpus
- The number of CPUs to reserve. Must be a number in the range 1 to 128. Defaults to 1.memory
- The amount of memory to reserve. Must be a positive number (in MB) or a string ending with a unit (K, M, G, T). Defaults to ~16G per CPU.gpus
- The number of CPUs to reserve, either 0, 1, or 2. Jobs that reserve CPUs will be run on the GPU queue. Defaults to 0.gpu_type
- Preferred GPU type, if any, either ‘a100’ or ‘h100’. Defaults to None.job_name
- An optional string naming the current Slurm job.modules
- A list of zero or more environment modules to load before running the commands specified above. Defaults to ().extra_args
- A list of arguments passed directly to srun/sbatch. Multi-part arguments must therefore be split into multiple values: [“–foo”, “bar”] and not [“–foo bar”]output_file
- Optional name of log-file foom the job.array_params
- Optional job-array parameters (see “–array”).mail_user
- Send an email to user on failures or completion of the job. May either be an email address, orTrue
to send an email to$USER@ku.dk
.wait
- If true, wait for the job to complete before returning. Defaults to False.strict
- If true, the script is configured to terminate on the first error. Defaults to true.
Returns:
int
- The JobID of the submitted job.
srun
function¶
def srun(
command: Sequence[str],
*,
cpus: int = 1,
gpus: int = 0,
memory: int | str | None = None,
modules: SequenceNotStr[str] = (),
extra_args: SequenceNotStr[str] = (),
capture: bool = False,
text: bool = True,
strict: bool = True
) -> SrunResult[None] | SrunResult[str] | SrunResult[bytes]
Run command via srun
, and optionally capture its output.
Warning
This function can only be used from esrumhead01fl!
Arguments:
command
- The command to run, either as a single string that is assumed to contain a properly formatted shell command, or as a list of strings, that is assumed to present each argument in the command.cpus
- The number of CPUs to reserve. Must be a number in the range 1 to 128. Defaults to 1.memory
- The amount of memory to reserve. Must be a positive number (in MB) or a string ending with a unit (K, M, G, T). Defaults to ~16G per CPU.gpus
- The number of CPUs to reserve, either 0, 1, or 2. Jobs that reserve CPUs will be run on the GPU queue. Defaults to 0.gpu_type
- Preferred GPU type, if any, either ‘a100’ or ‘h100’. Defaults to None.extra_args
- A list of arguments passed directly to srun/sbatch. Multi-part arguments must therefore be split into multiple values: [“–foo”, “bar”] and not [“–foo bar”]modules
- A list of zero or more environment modules to load before running the commands specified above. Defaults to ().capture
- If true, srun’s stdout and stderr is captured and returned. Defaults to False.text
- If true, output captured bycapture
is assumed to be UTF8 and decoded to strings. Otherwise bytes are returned. Defaults to True.strict
- If true, the script is configured to terminate on the first error. Defaults to true.
Raises:
SlurmError
- Raised if this command is invoked on a compute node.
Returns:
int
- The exit-code from runningsrun
(non-zero on error) int, str, str: The srun exit-code, stdout, and stderr, ifcapture
is True. int, bytes, bytes: As above, buttext
is False.
Troubleshooting¶
Jupyter Notebooks: Browser error when opening URL¶
Depending on your browser you may receive one of the following errors. The typical causes are listed, but the exact error message will depend on your browser. It is therefore helpful to review all possible causes listed here.
When using Chrome, the cause is typically listed below the line that says "This site can't be reached".
The connection was reset
This typically indicates that Jupyter Notebook isn't running on the server, or that it is running on a different port than the one you've forwarded. Check that Jupyter Notebook is running and make sure that your forwarded ports match those used by Jupyter Notebook on Esrum.
Localhost refused to connect
orUnable to connect
This typically indicates that port forwarding isn't active, or that you have entered the wrong port number in your browser. Therefore,
Verify that port forwarding is active: On OSX/Linux that means verifying that an
ssh
command is running as described in the Port forwarding for OSX/Linux users section, and on Windows that means activating port forwarding in MobaXterm as described in the Port forwarding for Windows users section.If using the instructions for Linux/OSX, verify that you ran the
ssh
command on your laptop or desktop, and not on the Esrum head node.Verify that either of these are using the same port number as in the
jupyter
command you ran or as in thehttp://127.0.0.1
URL printed by Jupyter.Verify that you are using the second URL that Jupyter prints on the terminal, namely the URL starting with
http://127.0.0.1:XXXX
:To access the notebook, open this file in a browser: file:///home/abc123/.local/share/jupyter/runtime/nbserver-2082873-open.html Or copy and paste one of these URLs: http://esrumcmpn07fl.unicph.domain:XXXXX/?token=0123456789abcdefghijklmnopqrstuvwxyz or http://127.0.0.1:XXXXX/?token=0123456789abcdefghijklmnopqrstuvwxyz
For security reasons it is not possible to connect directly to the compute nodes.