Transferring data#
This section describes how to perform bulk data transfers between Esrum, your PC/Laptop, repositories such as SIF/Erda, and servers like Computerome. Secure emails using Bluewhale is also briefly described.
File transfers (including project-to-project transfers) should if at all possible be run on a compute node, as high amounts of network traffic may degrade performance on the head node for all users of the cluster. See the Interactive sessions section for how to open a shell on a compute node.
If you have an existing compute project or dataset on a UCPH-IT managed cluster, then you may be able to connect it directly to the Esrum cluster and thereby remove the need for transferring data entirely. Please contact us for more information.
Warning
Data must not be copied out of audited /datasets
or /projects
folders without permission from the relevant data controller! See the
Guidelines and policies for more information.
Best practice#
Data transfers running on Esrum should, when at all possible, be run on
a worker node using either sbatch
or srun
. This ensures that
other users are not impacted if one or more users are transferring a lot
of data.
Warning
Transfers running on the head node may be terminated without warning if they are found to impact the usability of the system.
Transferring data to/from Esrum#
A public SFTP server is made available at sftp.ku.dk
. This server
allows you to access your home folder, your projects, and your datasets
from another computer, whether a personal computer or another
server/cluster, and either upload data from that computer to Esrum or
download data from Esrum to that computer.
Unlike the esrumhead01fl
node, you do not need to be connected to
the UCPH-IT VPN to connect to sftp.ku.dk
. You only need access to
standard tool such as scp
, sftp
, lftp
, and rsync
, or
graphical tools such as FileZilla and MobaXterm (see the
Connecting to the cluster page), on the other computer:
$ sftp sftp://abc123@sftp.ku.dk
(abc123@sftp.ku.dk) Enter password
Password: ******
(abc123@sftp.ku.dk) Enter one-time password
Enter one-time password: ******
Connected to sftp.ku.dk.
sftp> ls
ucph
sftp> cd ucph/
sftp> ls
datasets hdir ndir projects
Depending on how you have configured UCPH two-factor authentication, you may either need to approve the connection attempt or (as shown above) enter a one-time password.
Official documentation is provided on the UCPH computing/HPC Systems pages on KUnet.
Transferring data to/from the N: and H: drives#
As noted in the UCPH network drives (H:, N:, and S:) section, the N:
and
H:
drives are accessible via the ~/ucph
folder, but only from
the head node.
To avoid impacting other users, we therefore request that transfers to
or from these drives be carried out using rsync
with rate-limiting
in place. This is accomplished using the --bwlimit=50M
option, which
limits the transfer-rate to 50 MB/s on average (or ~20 seconds per GB).
The following command, for example, recursively copies the files in
/from/path/
to the folder /to/path/
, with a max transfer-rate of
50 MB/s:
$ rsync -av --progress=summary --bwlimit=50M /from/path/ /to/path/
It is furthermore recommended to run your transfer in a tmux
(or
screen
) instance. See the Persistent sessions with tmux page for more
information. This allows your transfer to keep running after you log
off.
If you have need to transfer amounts of data that are not feasible with this rate limit in place, then please Contact us for assistance.
Warning
Transfers running on the head node, that are not rate-limited, will be terminated without warning due to the impact on other users of the cluster.
Transferring data to/from SIF and ERDA#
Connecting to the SIF or ERDA servers requires that the user has successfully authenticated using Two-factor authentication. Furthermore, this must be done using the same IP from which the user intends to connect, in this case from the Esrum IP.
This poses some challenges, as running a full-fledged browser over SSH performs very poorly. This section therefore describes how to authenticate to SIF or ERDA using a purely text-based browser available on the cluster (Lynx):
Start Lynx as follows:
lynx -accept_all_cookies "https://sif.ku.dk"
Use the up/down arrow keys to select the
log in
link underI'm already signed up to SIF with my KU / UCPH account!
and pressenter
.Make sure that the
Let me in without it, I want to try
is highlighted and press enter to confirm that you wish to try login.Enter your UCPH username and password. Use the
tab
button to jump to the next field andShift+Tab
to jump to the previous field. Finally usetab
to select the "Yes" button (appears as(BUTTON) Yes
) and pressenter
.Enter your SIF two-factor code, press
tab
to select theSubmit
button, and pressenter
.You should now see a page with the header
SIF Project Management
, indicating that you have logged in:Press
Ctrl+C
to quit.
Once you have successfully authenticated you may connect to the SIF/ERDA servers as normal using the tools available on Esrum.
The recommended way to transfer data to/from SIF/ERDA is using the
lftp
command. This allows you use the built-in mirror
command to
recursively download entire folders. If you instead wish to upload a
folder recursively, simply use the mirror -R
command instead of just
mirror
.
For example, to download the contents of the folder my_data
into a
project, you might run the following:
$ mkdir /projects/my_project-AUDIT/data/my_data
$ cd /projects/my_project-AUDIT/data/my_data
$ lftp sftp://sif-io.erda.dk
> user ${YOUR_PROJECT_USERNAME}
Password: ***********
> set net:connection-limit 1
> set net:max-retries 1;
> cd my_data
> mirror
Your project username (${YOUR_PROJECT_USERNAME}
) is available via
the Setup
page for each project once you log into SIF and typically
looks something like Johann.Gambolputty@sund.ku.dk@MyProject
.
Warning
Remember to set a password for the project on SIF before attempting
to login! This is done on the Setup
page described above.
The two set
commands are required to prevent lftp
from
performing simultaneous downloads (not supported by SIF) and to prevent
lftp
from re-trying repeatedly on failure. As SIF sends an email
every time you fail to log in, allowing retries typically means
receiving numerous emails if a transfer fails.
Transferring data to/from Computerome#
When transferring data/to from Computerome you should always run the
transfer software on Esrum (or on your PC/laptop) and you should
always connect to Computerome via transfer.computerome.dk
instead
of ssh.computerome.dk
.
For example, to transfer data to Computerome, you might run
srun rsync -av ./ ${USERNAME}@transfer.computerome.dk:/home/projects/ab_12345/people/${USERNAME}/
This recursively transfers the current folder to a project folder on
Computerome, using srun
to run the actual transfer on a worker node
on Esrum. ${USERNAME}
in the above is your username on Computerome.
This avoids two big issues:
The Computerome administrators will terminate any attempts at transferring data via
ssh.computerome.dk
and may suspend your account if you keep trying. This applies both to running (for example)rsync
onssh.computerome.dk
or if you attempt upload data to or download data from this server.While it is possible to transfer data to/from Computerome from/to Esrum by running your software on a node, this involves paying for an node on Computerome for the duration of the transfer.
Secure emails using Bluewhale#
UCPH offers the ability to securely email large files, up to 20 GB in size, using Bluewhale. Files sent this way are encrypted using a password or using an SMS pin-code that is automatically sent to the recipient.
This service is available as plugins for Outlook (for Windows only) and via the web-portal https://bluewhale.ku.dk/. For more information, please refer to the official UCPH documentation on Email security in Danish or English.