Internal data transfers¶
This page describes how to transfer data that is already on Esrum, to another location on Esrum. All transfers must be performed on a compute node, as described below. Transfers running on the head node, or on the RStudio nodes, will be terminated on sight, as these impact all users of those nodes.
Transferring data between projects or datasets¶
As a rule of thumb, data should only be located in one project or
dataset folder. However, should you need to make a copy of one or more
files, then it is recommended to use the rsync command to do so. See
the Rsync basics section below for more information.
You must run your copy commands (whether you use rsync, cp, or
some other tool) on a compute node, either in an interactive
sessions, or by using srun to execute the
command on a compute node, as shown in the examples below. See the
Running commands using srun section for more information about using srun.
If you are copying data from a
/projectsfolder, use the commandsrun rsync -av --progress /copy/this/data/ /to/this/location/If you are copying data from a
/datasetsfolder, use the commandsrun rsync -av --no-perms --progress /copy/this/data/ /to/this/location/
Warning
Do not copy data out of -AUDIT folders without the explicit
permission from the data controller and never store sensitive data in
a non--AUDIT folder!
Warning
Transfers running on the head node will be terminated without warning, due to the impact on other users of the cluster.
Tip
Running your transfer in a tmux or screen session is
recommended. This allows your transfer to keep running after you log
off from Esrum. See the Persistent sessions with tmux page for more information.
Copying data to/from the H:, N:, and S: drives¶
To avoid impacting other users, you must run transfers on compute nodes.
However, as described on the Network drives (H:, N:, S:) page, the H:,
N:, and S: drives are not accessible from compute nodes by
default.
Therefore, you must start an interactive session, log in using the
/usr/bin/kinit command, and then access the network drives via the
/maps folder:
# Start an interactive session
srun --pty -- /bin/bash
# Log in to enable the network drives
/usr/bin/kinit
# View my H: drive; '${USER}' corresponds to your abc123 username
ls /maps/hdir/${USER}/
Your login will expire after about 12 hours, at which point you have to
run /usr/bin/kinit on the node again. However, while your login is
active, your network folders can be found at the following locations:
Drive |
Location |
|---|---|
|
|
|
|
|
|
Note that these folders will be only created once you attempt to access
them, provided that you have logged in using /usr/bin/kinit.
It is recommended to use rsync to copy data to/from the
network-drives, as described below, but you do not need to use
srun in this case, as you are already working in an interactive
session if you followed the instructions above.
Warning
Do not copy data out of -AUDIT folders without the explicit
permission from the data controller and never store sensitive data in
a non--AUDIT folder!
Warning
Transfers running on the head node will be terminated without warning, due to the impact on other users of the cluster.
Tip
Running your transfer in a tmux or screen session is
recommended. This allows your transfer to keep running after you log
off from Esrum. See the Persistent sessions with tmux page for more information.
Rsync basics¶
rsync allows you to recursively copy data between two locations,
either on the same system or between two different systems (via SSH).
Unlike plain cp, it is also easy to resume a transfer that has been
interrupted, simply by running rsync again.
The basic rsync command you should be using is
rsync -av --progress /copy/this/data/ /to/this/location/
The
-aoption enables "archive" mode, which preserves meta-information such as timestamps and permissions.The
-voption and the--progressoptions are optional, but makersynclist the last copied file and the progress when copying (large) files.The paths in the above example both ends in a
/. This is intentional, and makesrsynccopy the content ofdatainto the folderlocation. If you instead ranrsync -av --progress /copy/this/data /to/this/location/, then thedatafolder would be placed at/to/this/location/data
However, when copying data from a /datasets it is necessary to add
the --no-perms options, since rsync would otherwise set all
permissions to 000, due to how access-control is implemented for
/datasets. See the troubleshooting section below if you forget to
add this option.
You must run rsync command on a compute node, either in an
interactive sessions, or by using
srun to automatically run the command on a compute node. See the
Running commands using srun section for more information about using srun.
Troubleshooting¶
rsync fails with Permission denied when copying from /datasets¶
If you forget to use the --no-perms option when rsync'ing data out
of a /datasets folder, then all permissions will be set to 000.
In other words, nobody can read, write, or execute those files and
folders.
To fix this, first run the following commands to fix the permissions,
where /path/to/copied/data is the path to the copy of the data that
you have created.
chmod -R +rX,u+w /path/to/copied/data
This will recursively mark files and folders readable for everyone, mark folders executable for everyone (required to browse them), and mark files and folders writable for you (and only you).
Then re-run rsync and remember to include the --no-perms option.
Permission denied when accessing data copied from /datasets¶
See above.
The ~/ucph folder or subfolders are missing¶
Note that the ~/ucph folder is only available on the head node
(esrumhead01fl), and not on the RStudio servers nor on the compute
nodes. See the Accessing network drives from compute nodes section for how to
access the drives elsewhere.
If you are connected to the head node, then firstly make sure that you are not using GSSAPI (Kerberos) to log in. See the Connecting to the cluster page for instructions for how to disable this feature if you are using MobaXterm.
Once you have logged in to Esrum without GSSAPI enabled, and if the folder(s) are still missing, then run the following command to create any missing network folders:
$ bash /etc/profile.d/symlink-ucphmaps.sh
Once this is done, you should have a ucph symlink in your home
folder containing links to hdir (H:), ndir (N:), and
sdir (S:).
No such file or directory when accessing network drives¶
If you get a No such file or directory error when attempting to
access the network drives (~/ucph/hdir, ~/ucph/ndir, or
~/ucph/sdir), then please make sure that you are not logging in
using Kerberos (GSSAPI). See the Accessing network drives via MobaXterm
section for instructions for how to disable this feature if you are
using MobaXterm.
Note also that your login is also valid for about 10 hours, after which you will lose access to the network drives. See the section (Re)activating access to the network drives for how to re-authenticate if your access has timed out.
kinit: Unknown credential cache type while getting default ccache¶
The kinit command may fail if you are using a conda environment:
(base) $ kinit
kinit: Unknown credential cache type while getting default ccache
To circumvent this problem, either specify the full path to the
kinit executable (i.e. /usr/bin/kinit) or deactivate the
current/base environment by conda deactivate.