Using R on Esrum#

This section how to use R on Esrum and lays out various tips for making your work easier. This section is primarily focused on running R non-interactively via Slurm, in order to take full advantage of the available compute resources.

For interactive work, we recommend using the RStudio servers or using an Interactive sessions.

Selecting an R version#

Several versions of R are available via the module system. To load these, you need to load the version of R you want and a version of GCC, which is required to install/load R libraries.

If you intend to also make use of the RStudio servers, then we recommend that you R/4.3.3 (or another version of R/4.3.x) with gcc/8.5.0. This ensures that the R libraries you install are compatible between the compute nodes and the RStudio servers.

By default, the 4.3.x versions of R loads gcc/8.5.0, so you can simply use the --auto option when loading R/4.3.x:

$ module load --auto R/4.3.3
Loading R/4.3.3
  Loading requirement: gcc/8.5.0

R modules installed using versions of R other than 4.3.x will not be available on the RStudio server, and you will need to install them again.

Warning

Using a GCC version greater than 8.x with R/4.3.x may cause modules you install to fail to load on the RStudio server with the errors similar to the following:

Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)

See the Troubleshooting section below for more information.

Submitting R scripts using Slurm#

The recommended way to run R on Esrum is as non-interactive scripts submitted to slurm. This not only ensures that your analyses do not impact other users, but also makes your analyses reproducible.

To run an R script on the command-line, simply use the Rscript command:

$ cat my_script.R
cat("Hello, world!\n")
$ Rscript my_script.R
Hello, world!

For simple scripts you can use the commandArgs function to pass arguments to your scripts, allowing you to use them to process arbitrary data-sets:

args <- commandArgs(trailingOnly = TRUE)

cat("Hello, ", args[1], "!\n", sep="")
$ Rscript my_script.R world
Hello, world!

If your script requires a heterogeneous set of input files or options to run, then it is recommended to use an argument parser such as the argparser R library. To use the argparser library you must first install it using the install.packages("argparser") command.

The following shows a brief example of how you might use the argparser library. It can also be downloaded here.

   #!/usr/bin/env Rscript
   library(argparser)

   parser <- arg_parser("This is my script!")

   parser <- add_argument(parser, "input_file", help="My data")
   parser <- add_argument(parser, "--p-value", default=0.05, help="Maximum P-value")

   args <- parse_args(parser)
   cat("I would process the file", args$input_file, "with a max P-value of", args$p_value, "\n")

This allows you to document your command-line options, specify default values, and much more:

$ Rscript my_script.R
usage: my_script.R [--] [--help] [--opts OPTS] [--p-value P-VALUE]
    input_file

This is my script!

positional arguments:
input_file     My data

flags:
-h, --help     show this help message and exit

optional arguments:
-x, --opts     RDS file containing argument values
-p, --p-value  Maximum P-value [default: 0.05]

Error in parse_args(parser) :
Missing required arguments: expecting 1 values but got 0 values: ().
Execution halted
$ Rscript my_script.R my_data.tsv
I would process the file my_data.tsv with a max P-value of 0.05

Finally, you write can write a small bash script to automatically load the required version of R and to call your script when you submit it to Slurm (using your preferred version of R):

#!/bin/bash

module load gcc/8.5.0 R/4.1.2
Rscript "${@}"

The "${@}" safely passes all your command-line arguments to Rscript, even if they contain spaces. This wrapper script can then be used to submit/call any of your R-scripts:

$ sbatch run_rscript.sh my_script.R my_data.tsv --p-value 0.01
Submitted batch job 18090212
$ cat slurm-18090212.out
I would process the file my_data.tsv with a max P-value of 0.01

Installing R modules#

Modules may be installed in your home folder using the install.packages command:

$ module load gcc/8.5.0 R/4.3.1
$ R
> install.packages("ggplot2")
Warning in install.packages("ggplot2") :
  'lib = "/opt/software/R/4.3.1/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/home/abc123/R/x86_64-pc-linux-gnu-library/4.3’
to install packages into? (yes/No/cancel) yes

When asked to pick a mirror, either pick 0-Cloud by entering 1 and pressing enter, or enter the number corresponding to a location near you and press enter:

--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors

1: 0-Cloud [https]
[...]

Selection: 1

Troubleshooting#

libtk8.6.so: cannot open shared object file#

Users connecting to Esrum with X11 forwarding enabled, for example using MobaXterm with default settings, may observe the following error when running the install.packages:

--- Please select a CRAN mirror for use in this session ---
Error: .onLoad failed in loadNamespace() for 'tcltk', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/opt/software/R/4.3.1/lib64/R/library/tcltk/libs/tcltk.so':
  libtk8.6.so: cannot open shared object file: No such file or directory

If so, then you must disable graphical menus before running install.packages by first entering the following command:

> options(menu.graphics=FALSE)

Then simply run install.packages again.

You can also set the R option permanently by running the following in your (bash) terminal:

$ echo 'options(menu.graphics=FALSE)' | tee -a ~/.Rprofile

libstdc++.so.6: version 'GLIBCXX_3.4.26' not found#

If you build an R library on the head/compute nodes using a version of the GCC module other than gcc/8.5.0, then this library may fail to load on the RStudio node or when gcc/8.5.0 is loaded on the head/compute nodes:

$ R
> library(wk)
Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)

To fix his, you will need to reinstall the affected R libraries using one of two methods:

  1. Connect to the RStudio server as described in the Troubleshooting section, and simply install the affected packages using the install.packages function:

    > install.packages("wk")
    

    You may need to repeat this step multiple times, for every package that fails to load.

  2. Connect to the head node or a compute node, and take care to load the correct version of GCC before loading R:

    $ module load gcc/8.5.0 R/4.3.2
    $ R
    > install.packages("wk")
    

The name of the affected module can be determined by looking at the error message above. In particular, the path /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so contains a pair of folders named R/x86_64-pc-linux-gnu-library, which specifies the kind of system we are running on. Immediately after that we find the package name, namely wk in this case.

You can identify all affected packages in your "global" R library by running the following commands:

$ module load gcc/8.5.0 R/4.3.2
  1. cd to your R library

    $ cd ~/R/x86_64-pc-linux-gnu-library/4.3/
    
  2. Test every installed library

    $ for lib in $(ls);do echo "Testing ${lib}"; Rscript <(echo "library(${lib})") > /dev/null;done
    

Output will look like the following:

Testing httpuv
Testing igraph
Error: package or namespace load failed for ‘igraph’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so)
Execution halted
Testing isoband
Error: package or namespace load failed for ‘isoband’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version`GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so)
Execution halted
Testing labeling
Testing later

Locate the error messages like the one shown above in the output and reinstall the affected libraries using the install.packages command:

$ R
> install.packages(c("igraph", "isoband"))