Troubleshooting#
This page gathers all troubleshooting steps from the other parts of this documentation for easy access. Remember that you are always welcome to contact us if you have questions or problems relating to the cluster.
Connecting to Esrum#
If you have not already been granted access to the server, then please see the Applying for access page before continuing!
Timeout while connecting to the cluster#
You may experience timeout errors when you attempt to connect to the server:
Firstly verify that you are correctly connected to the UCPH network. In order to connect to Esrum you must either use a wired connection in a CBMR office, or you must be connected through the UCPH VPN. See the official VPN documentation in Danish or English for more information.
If you are still unable to connect to Esrum after verifying that you are correctly connected to the UCPH network, then please try to visit either our Project Manager or our Cohort Catalog.
If you are able to visit either of the Project Manager or Cohort Catalog pages, then you most likely do not have proper permissions to connect to Esrum. Please contact us and we will provide further guidance.
If you are unable to connect to the VPN or to either of the above pages
while connected to the VPN, then there may be other problems with your
account. We recommend that you either contact us for
assistance or, if you prefer, that you submit a ticket to the UCPH-IT
Serviceportal, using the Research Applications Counseling and
Support
/ Forskningsapplikationer Rådgivning og support
ticket
category.
File uploads using MobaXterm never start#
Please make sure that your session is configured to use the SCP
(enhanced speed)
browser type. See step 4 in the
Configuring MobaXterm section.
UCPH network-folders in ~/ucph
are not available when using MobaXterm#
Please make sure that you have disabled use of GSSAPI Kerberos
as
described in the Configuring MobaXterm section.
Slurm basics#
Error: Requested node configuration is not available#
If you request too many CPUs (more than 128), or too much RAM (more than 1993 GB for compute nodes and more than 3920 GB for the GPU node), then Slurm will report that the request cannot be satisfied:
# More than 128 CPUs requested
$ sbatch --cpus-per-task 200 my_script.sh
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
# More than 1993 GB RAM requested on compute node
$ sbatch --mem 2000G my_script.sh
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
To solve this, simply reduce the number of CPUs and/or the amount of RAM requested to fit within the limits described above. If your task does require more than 1993 GB of RAM, then you need to run your task on the GPU queue as described on the Using the GPU / high-memory nodes page.
Additionally, you may receive this message if you request GPUs without specifying the correct queue or if you request too many GPUs:
# --partition=gpuqueue not specified
$ srun --gres=gpu:2 -- echo "Hello world!"
srun: error: Unable to allocate resources: Requested node configuration is not available
# More than 2 GPUs requested
$ srun --partition=gpuqueue --gres=gpu:3 -- echo "Hello world!"
srun: error: Unable to allocate resources: Requested node configuration is not available
To solve this error, simply avoid requesting more than 2 GPUs, and
remember to include the --partition
option. See also the
Using the GPU / high-memory nodes section.
R#
libstdc++.so.6: version 'GLIBCXX_3.4.26'
not found#
If you build an R library on the head/compute nodes using a version of
the GCC module other than gcc/8.5.0
, then this library may fail to
load on the RStudio node or when gcc/8.5.0
is loaded on the
head/compute nodes:
$ R
> library(wk)
Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)
To fix his, you will need to reinstall the affected R libraries using one of two methods:
Connect to the RStudio server as described in the RStudio servers section, and simply install the affected packages using the
install.packages
function:> install.packages("wk")
You may need to repeat this step multiple times, for every package that fails to load.
Connect to the head node or a compute node, and take care to load the correct version of GCC before loading R:
$ module load gcc/8.5.0 R/4.3.2 $ R > install.packages("wk")
The name of the affected module can be determined by looking at the
error message above. In particular, the path
/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so
contains a pair of folders named R/x86_64-pc-linux-gnu-library
,
which specifies the kind of system we are running on. Immediately after
that we find the package name, namely wk
in this case.
You can identify all affected packages in your "global" R library by running the following commands:
module load gcc/8.5.0 R/4.3.2
cd to your R library
cd ~/R/x86_64-pc-linux-gnu-library/4.3/
Test every installed library
for lib in $(ls);do echo "Testing ${lib}"; Rscript <(echo "library(${lib})") > /dev/null;done
Output will look like the following:
Testing httpuv
Testing igraph
Error: package or namespace load failed for ‘igraph’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so)
Execution halted
Testing isoband
Error: package or namespace load failed for ‘isoband’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version`GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so)
Execution halted
Testing labeling
Testing later
Locate the error messages like the one shown above in the output and
reinstall the affected libraries using the install.packages
command:
$ R
> install.packages(c("igraph", "isoband"))
Rstudio#
Incorrect or invalid username/password#
Please make sure that you are entering your username in the short form
(i.e. abc123
) and that you have applied for and been given access to
the Esrum HPC (see Applying for access). If the problem
persists, please Contact us for assistance.
Logging in takes a very long time#
Similar to regular R, RStudio will automatically save the data you have loaded into your R session and will restore it when you return later, so that you can continue your work. However, this many result in large amounts of data being saved and loading this data may result in a large delay when you attempt to log in at a later date.
It is therefore recommended that you regularly clean up your workspace using the built-in tools, when you no longer need to have the data loaded in R.
You can remove individual bits of data using the rm
function in R.
This works both when using regular R and when using RStudio. The
following gives two examples of using the rm
function, one removing
a single variable and the other removing all variables in the current
session:
# 1. Remove the variable `my_variable`
rm(my_variable)
# 2. Remove all variables from your R session
rm(list = ls())
Alternatively you can remove all data saved in your R session using the
broom icon on the Environment
tab:
If you wish to prevent this issue in the first case, then you can also
turn off saving the data in your session on exit and/or turn off loading
the saved data on startup. This is accomplished via the Global
Options...
accessible from the Tools
menu:
Should your R session have grown to such a size that you simply cannot log in and clean it up, then it may be necessary to remove the files containing the data that R/RStudio has saved. This data is stored in two locations:
In the
.RData
file in your home (~/.RData
). This is where R saves your data if you answer yesSave workspace image? [y/n/c]
when quitting R.In the
environment
file in your RStudio session folder (~/.local/share/rstudio/sessions/active/session-*/suspended-session-data/environment
). This is where RStudio saves your data should your login time-out while using RStudio.
Please Contact us if you need help removing the correct files.
libstdc++.so.6: version 'GLIBCXX_3.4.26'
not found#
See the troubleshooting section on the R and RStudio page.
Jupyter Notebooks#
Jupyter Notebooks: Browser error when opening URL#
Depending on your browser you may receive one of the following errors. The typical causes are listed, but the exact error message will depend on your browser. It is therefore helpful to review all possible causes listed here.
When using Chrome, the cause is typically listed below the line that says "This site can't be reached".
"The connection was reset"
This typically indicates that Jupyter Notebook isn't running on the server, or that it is running on a different port than the one you've forwarded. Check that Jupyter Notebook is running and make sure that your forwarded ports match those used by Jupyter Notebook on Esrum.
"Localhost refused to connect" or "Unable to connect"
This typically indicates that port forwarding isn't active, or that you have entered the wrong port number in your browser. Verify that port forwarding is active and that you are using the correct port number in the
localhost
URL."Check if there is a typo in esrumweb01fl" or "We're having trouble finding that site"
You are most likely connecting from a network outside UCPH. Make sure that you are using a wired connection at CBMR and/or that the VPN is activated and try again.
Snakemake#
sacct: error: Problem talking to the database: Connection refused#
If you are running Snakemake with the --slurm
option on a compete
node, i.e. not the head node, then you will receive errors such as the
following:
Job 0 has been submitted with SLURM jobid 512921 (log: .snakemake/slurm_logs/rule_foo/512921.log).
The job status query failed with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --name 2d898259-73e4-435d-aa77-44dc44d84c1b
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
To solve this, simply start your Snakemake pipeline on the head node
when using the --slurm
option.