Internal data transfers¶
This page describes how to transfer data that is already on Esrum, to another location on Esrum. All transfers must be performed on a compute node, as described below. Transfers running on the head node, or on the RStudio nodes, will be terminated on sight, as these impact all users of those nodes.
Transferring data between projects or datasets¶
As a rule of thumb, data should only be located in one project or
dataset folder. However, should you need to make a copy of one or more
files, then it is recommended to use the rsync command to do so. See
the Rsync basics section below for more information.
You must run your copy commands (whether you use rsync, cp, or
some other tool) on a compute node, either in an interactive
sessions, or by using srun to execute the
command on a compute node, as shown in the examples below. See the
Running commands using srun section for more information about using srun.
If you are copying data from a
/projectsfolder, use the commandsrun rsync -av --progress /copy/this/data/ /to/this/location/If you are copying data from a
/datasetsfolder, use the commandsrun rsync -av --no-group --chmod=ugo=rwX --progress /copy/this/data/ /to/this/location/
Warning
Transfers running on the head node will be terminated without warning, due to the impact on other users of the cluster.
Warning
Do not copy data out of -AUDIT folders without the explicit
permission from the data controller and never store sensitive data
in a non--AUDIT folder!
Tip
Running your transfer in a tmux or screen session is
recommended. This allows your transfer to keep running after you log
off from Esrum. See the Persistent sessions with tmux page for more
information.
Copying data to/from the H:, N:, and S: drives¶
To avoid impacting other users, you should run transfers on compute nodes, if at all possible.
However, as of 2026-05-05, most users are unable to access the network drives from compute and RStudio nodes. Therefore, if the instructions in the Copying data to/from network drives and compute nodes section do not work, then please see the Copying data to/from network drives and the head node below for how to run the transfer on the head node.
Copying data to/from network drives and compute nodes¶
As described on the Network drives (H:, N:, S:) page, the H:, N:,
and S: drives are not accessible from compute nodes by default.
Therefore, you must start an interactive session, log in using the
/usr/bin/kinit command, and then access the network drives via the
/maps folder, after you've started an interactive session:
# Start an interactive session
srun --pty -- /bin/bash
# Log in to enable the network drives
/usr/bin/kinit
# View my H: drive; '${USER}' corresponds to your abc123 username
ls /maps/hdir/${USER}/
Your login will expire after about 12 hours, at which point you have to
run /usr/bin/kinit on the node again. However, while your login is
active, your network folders can be found at the following locations:
Drive |
Location |
|---|---|
|
|
|
|
|
|
Note that these folders will be only created once you attempt to access
them, provided that you have logged in using /usr/bin/kinit.
It is recommended to use rsync to copy data to/from the
network-drives, as described below, but you do not need to use
srun in this case, as you are already working in an interactive
session if you followed the instructions above.
Tip
You do not need to use --bwlimit when running transfers on a
compute node.
Copying data to/from network drives and the head node¶
Attention
If you are currently in an interactive session on a compute node, you need to either exit that session first, or connect to the head-node again, before following these instructions.
If the instructions in the Copying data to/from network drives and compute nodes
section do not work, then you have to run the transfer on the head node.
However, to avoid negatively impacting other users of Esrum, we require
that these transfers are rate-limited to at most 50 MB/s (total) using
the rsync --bwlimit option, and that you run no more than a single
transfer at a time:
$ rsync -av --progress=summary --bwlimit=50M /from/path/ /to/path/
If you run transfers without rate limits (include using cp or mv), or if you run transfers with a total rate limit above 50 MB/s, then these will be terminated to prevent them from impacting other users of Esrum.
If you have an urgent need to transfer data from a network drive, or if the size of the data is so large that 50 MB/s (or roughly 6 hours per TB) is not feasible, then please contact us.
Warning
Do not copy data out of -AUDIT folders without the explicit
permission from the data controller and never store sensitive data
in a non--AUDIT folder!
Tip
Running your transfer in a tmux or screen session is
recommended. This allows your transfer to keep running after you log
off from Esrum. See the Persistent sessions with tmux page for more
information.
Rsync basics¶
rsync allows you to recursively copy data between two locations,
either on the same system or between two different systems (via SSH).
Unlike plain cp, it is also easy to resume a transfer that has been
interrupted, simply by running rsync again.
The basic rsync command you should be using is
rsync -av --progress /copy/this/data/ /to/this/location/
The
-aoption enables "archive" mode, which preserves meta-information such as timestamps and permissions.The
-voption and the--progressoptions are optional, but makersynclist the last copied file and the progress when copying (large) files.The paths in the above example both ends in a
/. This is intentional, and makesrsynccopy the content ofdatainto the folderlocation. If you instead ranrsync -av --progress /copy/this/data /to/this/location/, then thedatafolder would be placed at/to/this/location/data
However, when copying data from a /datasets it is necessary to add
the --no-perms --chmod=ugo=rwX options, since rsync would
otherwise set all permissions to 000, due to how access-control is
implemented for /datasets folder. See the troubleshooting section
below if you forget to add this option.
You must run rsync command on a compute node, either in an
interactive sessions, or by using
srun to automatically run the command on a compute node. See the
Running commands using srun section for more information about using srun.
Copying instrument data to projects or datasets¶
As the /labs folders are currently only accessible from the head node,
it is necessary to run the transfers directly on the head node. These
transfers must be rate-limited to at most 50 MB/s (total) using the
rsync --bwlimit option, and you must not run more than a single
transfer at a time:
$ rsync -av --no-perms --chmod=ugo=rwX --progress=summary --bwlimit=50M /from/path/ /to/path/
Warning
Similarly to /datasets folders, all files and folders on
/labs drives have permissions 000, i.e. no read and no write
access, even when you have access to the data. For this reason, you
must include the --no-perms --chmod=ugo=rwX options when
running rsync, to prevent rsync from recreating these
permissions. If you omit --no-perms --chmod=ugo=rwX, then
rsync normally fails during the transfer, due not being able to
write to the destination.
If you run transfers without rate limits (include using cp or mv), or if you run transfers with a total rate limit above 50 MB/s, then these will be terminated to prevent them from impacting other users of Esrum.
If you have an urgent need to transfer instrument data, or if the size of the data is so large that 50 MB/s (or roughly 6 hours per TB) is not feasible, then please contact us.
Troubleshooting¶
rsync fails with Permission denied when copying from /datasets¶
If you forget to use the appropriate options when rsync'ing data out of
a /datasets folder, then all permissions will be set to 000. In
other words, nobody can read, write, or execute those files and folders.
To fix this, first run the following commands to fix the permissions,
where /path/to/copied/data is the path to the copy of the data that
you have created.
chmod -R +rX,u+w /path/to/copied/data
This will recursively mark files and folders readable for everyone, mark folders executable for everyone (required to browse them), and mark files and folders writable for you (and only you).
Then re-run rsync and remember to include the appropriate options,
as described in the Rsync basics section.
Permission denied when accessing data copied from /datasets¶
See above.
The ~/ucph folder or subfolders are missing¶
Note that the ~/ucph folder is only available on the head node
(esrumhead01fl), and not on the RStudio servers nor on the compute
nodes. See the Accessing network drives from compute nodes section for how to
access the drives elsewhere.
If you are connected to the head node, then firstly make sure that you are not using GSSAPI (Kerberos) to log in. See the Connecting to the cluster page for instructions for how to disable this feature if you are using MobaXterm.
Once you have logged in to Esrum without GSSAPI enabled, and if the folder(s) are still missing, then run the following command to create any missing network folders:
$ bash /etc/profile.d/symlink-ucphmaps.sh
Once this is done, you should have a ucph symlink in your home
folder containing links to hdir (H:), ndir (N:), and
sdir (S:).
No such file or directory when accessing network drives¶
If you get a No such file or directory error when attempting to
access the network drives (~/ucph/hdir, ~/ucph/ndir, or
~/ucph/sdir), then please make sure that you are not logging in
using Kerberos (GSSAPI). See the Accessing network drives via MobaXterm
section for instructions for how to disable this feature if you are
using MobaXterm.
Note also that your login is also valid for about 10 hours, after which you will lose access to the network drives. See the section (Re)activating access to the network drives for how to re-authenticate if your access has timed out.
kinit: Unknown credential cache type while getting default ccache¶
The kinit command may fail if you are using a conda environment:
(base) $ kinit
kinit: Unknown credential cache type while getting default ccache
To circumvent this problem, either specify the full path to the
kinit executable (i.e. /usr/bin/kinit) or deactivate the
current/base environment by running conda deactivate until conda is
completely deactivated.