Troubleshooting¶

This page gathers all troubleshooting steps from the other parts of this documentation for easy access. Remember that you are always welcome to contact us if you have questions or problems relating to the cluster.

Connecting to Esrum¶

If you have not already been granted access to the server, then please see the Applying for access page before continuing!

Timeout while connecting to the cluster¶

You may experience timeout errors when you attempt to connect to Esrum.

On Linux, this typically results in an Operation timed out message:

$ ssh abc123@esrumhead01fl.unicph.domain
ssh: connect to host esrumhead01fl.unicph.domain port 22: Operation timed out

On Windows, using MobaXterm, it may result in a connection timed out message:

_images/connecting_mobaxterm_timeout.png

Firstly verify that you are correctly connected to the UCPH VPN. This is required to connect to Esrum. See the Connecting to the cluster page for more information.

If you are still unable to connect to Esrum after verifying that you are correctly connected to the UCPH network, then please try to visit either our Project Manager or our Cohort Catalog.

If you are able to visit either of the Project Manager or Cohort Catalog pages, then you most likely do not have proper permissions to connect to Esrum. Please contact us and we will provide further guidance.

If you are unable to connect to the VPN or to either of the above pages while connected to the VPN, then there may be other problems with your account. We recommend that you either contact us for assistance or, if you prefer, that you submit a ticket to the UCPH-IT Serviceportal, using the Research Applications Counseling and Support / Forskningsapplikationer Rådgivning og support ticket category.

File uploads using MobaXterm never start¶

Please make sure that your session is configured to use the SCP (enhanced speed) browser type. See step 4 in the Configuring MobaXterm section.

Network-folders in `~/ucph` are not available¶

Please make sure that you have disabled use of GSSAPI Kerberos as described in the Configuring MobaXterm section. Similarly, if using Linux or OSX, then you cannot be authenticating using a Kerberos ticket.

Slurm basics¶

Error: Requested node configuration is not available¶

If you request too many CPUs (more than 128), or too much RAM (more than 1993 GB for compute nodes and more than 3920 GB for the GPU node), then Slurm will report that the request cannot be satisfied.

If more than 128 CPUs requested:

$ sbatch --cpus-per-task 200 my_script.sh
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

More than 1993 GB RAM requested on compute node:

$ sbatch --mem 2000G my_script.sh
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

To solve this, simply reduce the number of CPUs and/or the amount of RAM requested to fit within the limits described above. If your task does require more than 1993 GB of RAM, then you need to run your task on the GPU queue as described on the GPU / high-memory jobs page.

Additionally, you may receive this message if you request GPUs without specifying the correct queue or if you request too many GPUs.

If --partition=gpuqueue not specified:

$ srun --gres=gpu:2 -- echo "Hello world!"
srun: error: Unable to allocate resources: Requested node configuration is not available

If more than 2 GPUs requested:

$ srun --partition=gpuqueue --gres=gpu:3 -- echo "Hello world!"
srun: error: Unable to allocate resources: Requested node configuration is not available

To solve this error, simply avoid requesting more than 2 GPUs, and remember to include the --partition option. See also the GPU / high-memory jobs section.

X11 forwarding is working in MobaXterm¶

Firstly right-click on Esrum in the list of User sessions and select Edit session. Make sure that the Advanced SSH settings tab is open and verify that X11 forwarding is enabled as shown:

Secondly, press the OK button and open the Settings via the gears icon on the main toolbar. Then select the X11 tab and verify that X11 support is configured as shown:

R¶

libtk8.6.so: cannot open shared object file¶

Users connecting to Esrum with X11 forwarding enabled, for example using MobaXterm with default settings, may observe the following error when running the install.packages:

--- Please select a CRAN mirror for use in this session ---
Error: .onLoad failed in loadNamespace() for 'tcltk', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/opt/software/R/4.3.1/lib64/R/library/tcltk/libs/tcltk.so':
  libtk8.6.so: cannot open shared object file: No such file or directory

If so, then you must disable graphical menus before running install.packages by first entering the following command:

> options(menu.graphics=FALSE)

Then simply run install.packages again.

You can also set the R option permanently by running the following in your (bash) terminal:

$ echo 'options(menu.graphics=FALSE)' | tee -a ~/.Rprofile

libstdc++.so.6: version `'GLIBCXX_3.4.26'` not found¶

If you build an R library on the head/compute nodes using a version of the GCC module other than gcc/8.5.0, then this library may fail to load on the RStudio node or when gcc/8.5.0 is loaded on the head/compute nodes:

$ R
> library(wk)
Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)

To fix his, you will need to reinstall the affected R libraries using one of two methods:

Connect to the RStudio server as described in the Troubleshooting section, and simply install the affected packages using the install.packages function:
```
> install.packages("wk")
```
You may need to repeat this step multiple times, for every package that fails to load.
Connect to the head node or a compute node, and take care to load the correct version of GCC before loading R:
```
$ module load gcc/8.5.0 R/4.3.2
$ R
> install.packages("wk")
```

The name of the affected module can be determined by looking at the error message above. In particular, the path /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so contains a pair of folders named R/x86_64-pc-linux-gnu-library, which specifies the kind of system we are running on. Immediately after that we find the package name, namely wk in this case.

You can identify all affected packages in your "global" R library by running the following commands:

$ module load gcc/8.5.0 R/4.3.2

cd to your R library

$ cd ~/R/x86_64-pc-linux-gnu-library/4.3/

Test every installed library

$ for lib in $(ls);do echo "Testing ${lib}"; Rscript <(echo "library(${lib})") > /dev/null;done

Output will look like the following:

Testing httpuv
Testing igraph
Error: package or namespace load failed for ‘igraph’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so)
Execution halted
Testing isoband
Error: package or namespace load failed for ‘isoband’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version`GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so)
Execution halted
Testing labeling
Testing later

Locate the error messages like the one shown above in the output and reinstall the affected libraries using the install.packages command:

$ R
> install.packages(c("igraph", "isoband"))

Rstudio¶

Incorrect or invalid username/password¶

Please make sure that you are entering your username in the short form (i.e. abc123) and that you have applied for and been given access to the Esrum HPC (see Applying for access). If the problem persists, please Contact us us for assistance.

Logging in takes a very long time¶

Similar to regular R, RStudio will automatically save the data you have loaded into your R session and will restore it when you return later, so that you can continue your work. However, this many result in large amounts of data being saved and loading this data may result in a large delay when you attempt to log in at a later date.

It is therefore recommended that you regularly clean up your workspace using the built-in tools, when you no longer need to have the data loaded in R.

You can remove individual bits of data using the rm function in R. This works both when using regular R and when using RStudio. The following gives two examples of using the rm function, one removing a single variable and the other removing all variables in the current session:

# 1. Remove the variable `my_variable`
rm(my_variable)

# 2. Remove all variables from your R session
rm(list = ls())

Alternatively you can remove all data saved in your R session using the broom icon on the Environment tab:

If you wish to prevent this issue in the first case, then you can also turn off saving the data in your session on exit and/or turn off loading the saved data on startup. This is accomplished via the Global Options... accessible from the Tools menu:

Should your R session have grown to such a size that you simply cannot log in and clean it up, then it may be necessary to remove the files containing the data that R/RStudio has saved. This data is stored in two locations:

In the .RData file in your home (~/.RData). This is where R saves your data if you answer yes Save workspace image? [y/n/c] when quitting R.
In the environment file in your RStudio session folder (~/.local/share/rstudio/sessions/active/session-*/suspended-session-data/environment). This is where RStudio saves your data should your login time out while using RStudio.

Please Contact us us if you need help removing the correct files.

libstdc++.so.6: version `'GLIBCXX_3.4.26'` not found¶

See the troubleshooting section on the Using R on Esrum page.

Jupyter Notebooks¶

Jupyter Notebooks: Browser error when opening URL¶

Depending on your browser you may receive one of the following errors. The typical causes are listed, but the exact error message will depend on your browser. It is therefore helpful to review all possible causes listed here.

When using Chrome, the cause is typically listed below the line that says "This site can't be reached".

The connection was reset

This typically indicates that Jupyter Notebook isn't running on the server, or that it is running on a different port than the one you've forwarded. Check that Jupyter Notebook is running and make sure that your forwarded ports match those used by Jupyter Notebook on Esrum.
Localhost refused to connect or Unable to connect

This typically indicates that port forwarding isn't active, or that you have entered the wrong port number in your browser. Therefore,
- Verify that port forwarding is active: On OSX/Linux that means verifying that an ssh command is running as described in the Port forwarding for OSX/Linux users section, and on Windows that means activating port forwarding in MobaXterm as described in the Port forwarding for Windows users section.
- If using the instructions for Linux/OSX, verify that you ran the ssh command on your laptop or desktop, and not on the Esrum head node.
- Verify that either of these are using the same port number as in the jupyter command you ran or as in the http://127.0.0.1 URL printed by Jupyter.
- Verify that you are using the second URL that Jupyter prints on the terminal, namely the URL starting with http://127.0.0.1:XXXX:
```
To access the notebook, open this file in a browser:
    file:///home/abc123/.local/share/jupyter/runtime/nbserver-2082873-open.html
    Or copy and paste one of these URLs:
        http://esrumcmpn07fl.unicph.domain:XXXXX/?token=0123456789abcdefghijklmnopqrstuvwxyz
    or http://127.0.0.1:XXXXX/?token=0123456789abcdefghijklmnopqrstuvwxyz
```
  For security reasons it is not possible to connect directly to the compute nodes.

Snakemake¶

sacct: error: Problem talking to the database: Connection refused¶

If you are running Snakemake with the --slurm option on a compute node, i.e. not the head node, then you will receive errors such as the following:

Job 0 has been submitted with SLURM jobid 512921 (log: .snakemake/slurm_logs/rule_foo/512921.log).
The job status query failed with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --name 2d898259-73e4-435d-aa77-44dc44d84c1b
Error message: sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

To solve this, simply start your Snakemake pipeline on the head node when using the --slurm option.

Troubleshooting¶

Connecting to Esrum¶

Timeout while connecting to the cluster¶

File uploads using MobaXterm never start¶

Network-folders in ~/ucph are not available¶

Slurm basics¶

Error: Requested node configuration is not available¶

X11 forwarding is working in MobaXterm¶

R¶

libtk8.6.so: cannot open shared object file¶

libstdc++.so.6: version 'GLIBCXX_3.4.26' not found¶

Rstudio¶

Incorrect or invalid username/password¶

Logging in takes a very long time¶

libstdc++.so.6: version 'GLIBCXX_3.4.26' not found¶

Jupyter Notebooks¶

Jupyter Notebooks: Browser error when opening URL¶

Snakemake¶

sacct: error: Problem talking to the database: Connection refused¶

Network-folders in `~/ucph` are not available¶

libstdc++.so.6: version `'GLIBCXX_3.4.26'` not found¶

libstdc++.so.6: version `'GLIBCXX_3.4.26'` not found¶