Using R on Esrum#
This section how to use R on Esrum and lays out various tips for making your work easier. This section is primarily focused on running R non-interactively via Slurm, in order to take full advantage of the available compute resources.
For interactive work, we recommend using the RStudio servers or using an Interactive sessions.
Selecting an R version#
Several versions of R are available via the module system. To load these, you need to load the version of R you want and a version of GCC, which is required to install/load R libraries.
If you intend to also make use of the RStudio servers, then we recommend
that you R/4.3.3
(or another version of R/4.3.x
) with
gcc/8.5.0
. This ensures that the R libraries you install are
compatible between the compute nodes and the RStudio servers.
By default, the 4.3.x versions of R loads gcc/8.5.0
, so you can
simply use the --auto
option when loading R/4.3.x
:
$ module load --auto R/4.3.3
Loading R/4.3.3
Loading requirement: gcc/8.5.0
R modules installed using versions of R other than 4.3.x
will not be
available on the RStudio server, and you will need to install them
again.
Warning
Using a GCC version greater than 8.x with R/4.3.x
may cause
modules you install to fail to load on the RStudio server with the
errors similar to the following:
Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)
See the Troubleshooting section below for more information.
Submitting R scripts using Slurm#
The recommended way to run R on Esrum is as non-interactive scripts submitted to slurm. This not only ensures that your analyses do not impact other users, but also makes your analyses reproducible.
To run an R script on the command-line, simply use the Rscript
command:
$ cat my_script.R
cat("Hello, world!\n")
$ Rscript my_script.R
Hello, world!
For simple scripts you can use the commandArgs
function to pass
arguments to your scripts, allowing you to use them to process arbitrary
data-sets:
args <- commandArgs(trailingOnly = TRUE)
cat("Hello, ", args[1], "!\n", sep="")
$ Rscript my_script.R world
Hello, world!
If your script requires a heterogeneous set of input files or options to
run, then it is recommended to use an argument parser such as the
argparser R library. To use the argparser
library you must first
install it using the install.packages("argparser")
command.
The following shows a brief example of how you might use the
argparser
library. It can also be downloaded here
.
#!/usr/bin/env Rscript
library(argparser)
parser <- arg_parser("This is my script!")
parser <- add_argument(parser, "input_file", help="My data")
parser <- add_argument(parser, "--p-value", default=0.05, help="Maximum P-value")
args <- parse_args(parser)
cat("I would process the file", args$input_file, "with a max P-value of", args$p_value, "\n")
This allows you to document your command-line options, specify default values, and much more:
$ Rscript my_script.R
usage: my_script.R [--] [--help] [--opts OPTS] [--p-value P-VALUE]
input_file
This is my script!
positional arguments:
input_file My data
flags:
-h, --help show this help message and exit
optional arguments:
-x, --opts RDS file containing argument values
-p, --p-value Maximum P-value [default: 0.05]
Error in parse_args(parser) :
Missing required arguments: expecting 1 values but got 0 values: ().
Execution halted
$ Rscript my_script.R my_data.tsv
I would process the file my_data.tsv with a max P-value of 0.05
Finally, you write can write a small bash script to automatically load the required version of R and to call your script when you submit it to Slurm (using your preferred version of R):
#!/bin/bash
module load gcc/8.5.0 R/4.1.2
Rscript "${@}"
The "${@}"
safely passes all your command-line arguments to
Rscript
, even if they contain spaces. This wrapper script can then
be used to submit/call any of your R-scripts:
$ sbatch run_rscript.sh my_script.R my_data.tsv --p-value 0.01
Submitted batch job 18090212
$ cat slurm-18090212.out
I would process the file my_data.tsv with a max P-value of 0.01
Installing R modules#
Modules may be installed in your home folder using the
install.packages
command:
$ module load gcc/8.5.0 R/4.3.1
$ R
> install.packages("ggplot2")
Warning in install.packages("ggplot2") :
'lib = "/opt/software/R/4.3.1/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/home/abc123/R/x86_64-pc-linux-gnu-library/4.3’
to install packages into? (yes/No/cancel) yes
When asked to pick a mirror, either pick 0-Cloud
by entering 1
and pressing enter, or enter the number corresponding to a location near
you and press enter:
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors
1: 0-Cloud [https]
[...]
Selection: 1
Troubleshooting#
libstdc++.so.6: version 'GLIBCXX_3.4.26'
not found#
If you build an R library on the head/compute nodes using a version of
the GCC module other than gcc/8.5.0
, then this library may fail to
load on the RStudio node or when gcc/8.5.0
is loaded on the
head/compute nodes:
$ R
> library(wk)
Error: package or namespace load failed for ‘wk’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so)
To fix his, you will need to reinstall the affected R libraries using one of two methods:
Connect to the RStudio server as described in the Troubleshooting section, and simply install the affected packages using the
install.packages
function:> install.packages("wk")
You may need to repeat this step multiple times, for every package that fails to load.
Connect to the head node or a compute node, and take care to load the correct version of GCC before loading R:
$ module load gcc/8.5.0 R/4.3.2 $ R > install.packages("wk")
The name of the affected module can be determined by looking at the
error message above. In particular, the path
/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/wk/libs/wk.so
contains a pair of folders named R/x86_64-pc-linux-gnu-library
,
which specifies the kind of system we are running on. Immediately after
that we find the package name, namely wk
in this case.
You can identify all affected packages in your "global" R library by running the following commands:
$ module load gcc/8.5.0 R/4.3.2
cd
to your R library$ cd ~/R/x86_64-pc-linux-gnu-library/4.3/
Test every installed library
$ for lib in $(ls);do echo "Testing ${lib}"; Rscript <(echo "library(${lib})") > /dev/null;done
Output will look like the following:
Testing httpuv
Testing igraph
Error: package or namespace load failed for ‘igraph’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/igraph/libs/igraph.so)
Execution halted
Testing isoband
Error: package or namespace load failed for ‘isoband’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so':
/opt/software/gcc/8.5.0/lib64/libstdc++.so.6: version`GLIBCXX_3.4.29' not found (required by /home/abc123/R/x86_64-pc-linux-gnu-library/4.3/isoband/libs/isoband.so)
Execution halted
Testing labeling
Testing later
Locate the error messages like the one shown above in the output and
reinstall the affected libraries using the install.packages
command:
$ R
> install.packages(c("igraph", "isoband"))