Troubleshooting#

This page gathers all troubleshooting steps from the other parts of this documentation for easy access. Remember that you are always welcome to contact us if you have questions or problems relating to the cluster.

Connecting to Esrum#

If you have not already been granted access to the server, then please see the Applying for access page before continuing!

Timeout while connecting to the cluster#

You may experience timeout errors when you attempt to connect to the server:

_images/connecting_ssh_timeout.gif

Firstly verify that you are correctly connected to the UCPH network. In order to connect to Esrum you must either use a wired connection in a CBMR office, or you must be connected through the UCPH VPN. See the official VPN documentation in Danish or English for more information.

If you are still unable to connect to Esrum after verifying that you are correctly connected to the UCPH network, then please try to visit either our Project Manager or our Cohort Catalog.

If you are able to visit either of the Project Manager or Cohort Catalog pages, then you most likely do not have proper permissions to connect to Esrum. Please contact us and we will provide further guidance.

If you are unable to connect to the VPN or to either of the above pages while connected to the VPN, then there may be other problems with your account. We recommend that you either contact us for assistance or, if you prefer, that you submit a ticket to the UCPH-IT Serviceportal, using the Research Applications Counseling and Support / Forskningsapplikationer Rådgivning og support ticket category.

File uploads using MobaXterm never start#

Please make sure that your session is configured to use the SCP (enhanced speed) browser type. See step 4 in the Configuring MobaXterm section.

UCPH network-folders in ~/ucph are not available when using MobaXterm#

Please make sure that you have disabled use of GSSAPI Kerberos as described in the Configuring MobaXterm section.

Slurm basics#

Error: Requested node configuration is not available#

If you request too many CPUs (more than 128), or too much RAM (more than 1993 GB for compute nodes and more than 3920 GB for the GPU node), then Slurm will report that the request cannot be satisfied:

# More than 128 CPUs requested
$ sbatch --cpus-per-task 200 my_script.sh
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

# More than 1993 GB RAM requested on compute node
$ sbatch --mem 2000G my_script.sh
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

To solve this, simply reduce the number of CPUs and/or the amount of RAM requested to fit within the limits described above. If your task does require more than 1993 GB of RAM, then you need to run your task on the GPU queue as described on the Using the GPU / high-memory nodes page.

Additionally, you may receive this message if you request GPUs without specifying the correct queue or if you request too many GPUs:

# --partition=gpuqueue not specified
$ srun --gres=gpu:2 -- echo "Hello world!"
srun: error: Unable to allocate resources: Requested node configuration is not available

# More than 2 GPUs requested
$ srun --partition=gpuqueue --gres=gpu:3 -- echo "Hello world!"
srun: error: Unable to allocate resources: Requested node configuration is not available

To solve this error, simply avoid requesting more than 2 GPUs, and remember to include the --partition option. See also the Using the GPU / high-memory nodes section.

R#

libtk8.6.so: cannot open shared object file#

Users connecting to Esrum with X11 forwarding enabled, for example using MobaXterm with default settings, may observe the following error when running the install.packages:

--- Please select a CRAN mirror for use in this session ---
Error: .onLoad failed in loadNamespace() for 'tcltk', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/opt/software/R/4.3.1/lib64/R/library/tcltk/libs/tcltk.so':
  libtk8.6.so: cannot open shared object file: No such file or directory

If so, then you must disable graphical menus before running install.packages by first entering the following command:

> options(menu.graphics=FALSE)

Then simply run install.packages again.

You can also set the R option permanently by running the following in your (bash) terminal:

$ echo 'options(menu.graphics=FALSE)' | tee -a ~/.Rprofile

libstdc++.so.6: version 'GLIBCXX_3.4.26' not found#

If you build an R library on the head/compute nodes using a version of the GCC module other than gcc/8.5.0, then this library may fail to load on the RStudio node or when gcc/8.5.0 is loaded on the head/compute nodes:

$ R
> library(wk)
Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)

To fix his, you will need to reinstall the affected R libraries using one of two methods:

  1. Connect to the RStudio server as described in the RStudio servers section, and simply install the affected packages using the install.packages function:

    > install.packages("wk")
    

    You may need to repeat this step multiple times, for every package that fails to load.

  2. Connect to the head node or a compute node, and take care to load the correct version of GCC before loading R:

    $ module load gcc/8.5.0 R/4.3.2
    $ R
    > install.packages("wk")
    

The name of the affected module can be determined by looking at the error message above. In particular, the path /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so contains a pair of folders named R/x86_64-pc-linux-gnu-library, which specifies the kind of system we are running on. Immediately after that we find the package name, namely wk in this case.

You can identify all affected packages in your "global" R library by running the following commands:

module load gcc/8.5.0 R/4.3.2
  1. cd to your R library

    cd ~/R/x86_64-pc-linux-gnu-library/4.3/
    
  2. Test every installed library

    for lib in $(ls);do echo "Testing ${lib}"; Rscript <(echo "library(${lib})") > /dev/null;done
    

Output will look like the following:

Testing httpuv
Testing igraph
Error: package or namespace load failed for ‘igraph’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so)
Execution halted
Testing isoband
Error: package or namespace load failed for ‘isoband’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version`GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so)
Execution halted
Testing labeling
Testing later

Locate the error messages like the one shown above in the output and reinstall the affected libraries using the install.packages command:

$ R
> install.packages(c("igraph", "isoband"))

Rstudio#

Incorrect or invalid username/password#

Please make sure that you are entering your username in the short form (i.e. abc123) and that you have applied for and been given access to the Esrum HPC (see Applying for access). If the problem persists, please Contact us for assistance.

Logging in takes a very long time#

Similar to regular R, RStudio will automatically save the data you have loaded into your R session and will restore it when you return later, so that you can continue your work. However, this many result in large amounts of data being saved and loading this data may result in a large delay when you attempt to log in at a later date.

It is therefore recommended that you regularly clean up your workspace using the built-in tools, when you no longer need to have the data loaded in R.

You can remove individual bits of data using the rm function in R. This works both when using regular R and when using RStudio. The following gives two examples of using the rm function, one removing a single variable and the other removing all variables in the current session:

# 1. Remove the variable `my_variable`
rm(my_variable)

# 2. Remove all variables from your R session
rm(list = ls())

Alternatively you can remove all data saved in your R session using the broom icon on the Environment tab:

_images/rstudio_gc_01.png _images/rstudio_gc_02.png

If you wish to prevent this issue in the first case, then you can also turn off saving the data in your session on exit and/or turn off loading the saved data on startup. This is accomplished via the Global Options... accessible from the Tools menu:

_images/rstudio_gc_03.png

Should your R session have grown to such a size that you simply cannot log in and clean it up, then it may be necessary to remove the files containing the data that R/RStudio has saved. This data is stored in two locations:

  1. In the .RData file in your home (~/.RData). This is where R saves your data if you answer yes Save workspace image? [y/n/c] when quitting R.

  2. In the environment file in your RStudio session folder (~/.local/share/rstudio/sessions/active/session-*/suspended-session-data/environment). This is where RStudio saves your data should your login time-out while using RStudio.

Please Contact us if you need help removing the correct files.

libstdc++.so.6: version 'GLIBCXX_3.4.26' not found#

See the troubleshooting section on the R and RStudio page.

Jupyter Notebooks#

Jupyter Notebooks: Browser error when opening URL#

Depending on your browser you may receive one of the following errors. The typical causes are listed, but the exact error message will depend on your browser. It is therefore helpful to review all possible causes listed here.

When using Chrome, the cause is typically listed below the line that says "This site can't be reached".

  • "The connection was reset"

    This typically indicates that Jupyter Notebook isn't running on the server, or that it is running on a different port than the one you've forwarded. Check that Jupyter Notebook is running and make sure that your forwarded ports match those used by Jupyter Notebook on Esrum.

  • "Localhost refused to connect" or "Unable to connect"

    This typically indicates that port forwarding isn't active, or that you have entered the wrong port number in your browser. Verify that port forwarding is active and that you are using the correct port number in the localhost URL.

  • "Check if there is a typo in esrumweb01fl" or "We're having trouble finding that site"

    You are most likely connecting from a network outside UCPH. Make sure that you are using a wired connection at CBMR and/or that the VPN is activated and try again.

Snakemake#

sacct: error: Problem talking to the database: Connection refused#

If you are running Snakemake with the --slurm option on a compete node, i.e. not the head node, then you will receive errors such as the following:

Job 0 has been submitted with SLURM jobid 512921 (log: .snakemake/slurm_logs/rule_foo/512921.log).
The job status query failed with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --name 2d898259-73e4-435d-aa77-44dc44d84c1b
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

To solve this, simply start your Snakemake pipeline on the head node when using the --slurm option.