Copying files over to an HPC cluster
It is possible to copy / transfer files using simple bash commands. scp
is used to simply copy data over while rsync
is used to synchronise folders.
scp
: Copying files
The first command is scp
which allows you to copy files from your machine to the HPC machines. For instance you want to copy the script train_model.py
from the template folder over to saga
in the project folder nn8055k, we would write (using your relevant username and HPC):
$ scp train_model.py bencretois@saga.sigma2.no:/cluster/projects/nn8055k/
With scp
it is also possible to copy a folder over to the HPC machine, you will need to add the flag -r
for this. For instance, if I want to copy the entire template
folder over to saga
I can write:
$ scp -r template bencretois@saga.sigma2.no:/cluter/projects/nn8055k
rsync
: Synchronizing a local repository with a remote repository
Instead of copying all files from your local to remote folder you can synchronze the two folders with rsync
. Synchronizing has the advantage of being more flexible than scp
and has some optimisations to make the transfer of files faster. Moreoever rsync
has a plethora of command line options, allowing the user to fine tune its behavior. It supports complex filter rules, runs in batch mode, daemon mode, etc.
$ rsync -e ssh -avz ./local_repo user@server:/remote_repo
-a
is the archive option, i.e. syncs directories recursively while keeping permissions, symbolic links, ownership, and group settings.
-v
being the verbose option and prints the progress and status of the rsync command.
-z
compressing files during the transfer - speed up the sync.
-e
is used to specify the remote shell to use, ssh
in our case.
It is also possible to use the option --exclude
to exclude some file from synchronisation:
$ rsync -e ssh -avz --exclude "file.txt" ./local_repo user@server:/remote_repo
However, in some cases there are files that we do not want to send to the remote repository. In these cases with can generate and .txt
file containing a list of files to exclude.
$ rsync -e ssh -avz --exclude-from{"list_ignore.txt"} ./local_repo user@server:/remote_repo
With list_ignore.txt
looking like:
folder1
file1.txt
folder2