Search code examples
pythonsshpytorchgenerative-adversarial-network

How can I send a custom dataset through ssh?


I have to train a GAN (coded in Python using pytorch) on a remote GPU that I can only access from my PC via ssh, but I have a custom dataset (that I cannot download from anywhere) which is stored in the PC without the GPU.

I've searched on Google very intensively and tried to use the scp command (which is the only solution that I've found), but it seems that the dataset is too big to be send within an acceptable time (13GB in size).

How can I transfer the dataset to the PC with the GPU within a decent amount of time, given that I cannot access the PC in any other way than an ssh connection, in order to train the network? Moreover, how can I retrieve the state_dict() and store it to my PC, once the training is complete?


Solution

  • It has nothing to do with the dataset itself. You can use Rsync to transfer files from you PC to the remote server using SSH and vice versa meaning you can transfer data/folders from remote server to your local PC as well.

    Rsync is a utility for efficiently transferring and synchronizing files between a computer and an external hard drive and across networked computers by comparing the modification times and sizes of files. It is also well suited for transferring large files over ssh as it is able to resume from previously interrupted transfer.

    From here:

    rsync is typically used for synchronizing files and directories between two different systems. For example, if the command rsync local-file user@remote-host:remote-file is run, rsync will use SSH to connect as user to remote-host.[7] Once connected, it will invoke the remote host's rsync and then the two programs will determine what parts of the local file need to be transferred so that the remote file matches the local one.

    How to use:

    Similar to cp, rcp and scp, rsync requires the specification of a source and of a destination, of which at least one must be local.

    Generic syntax:

    rsync [OPTION] … SRC … [USER@]HOST:DEST
    rsync [OPTION] … [USER@]HOST:SRC [DEST]
    

    where SRC is the file or directory (or a list of multiple files and directories) to copy from, DEST is the file or directory to copy to, and square brackets indicate optional parameters.

    Simple example :

    The following command will transfer all the files in the directory dataset to the home directory in the remote server:

    rsync -avz dataset/ root@192.168.0.101:/home/
    

    the -avz switch options simply mean, compress and transfer the files in archive mode and show the progress on screen:

    Common options : 
    -v : verbose
    -r : copies data recursively (but don’t preserve timestamps and permission while transferring data
    -a : archive mode, archive mode allows copying files recursively and it also preserves symbolic links, file permissions, user & group ownerships and timestamps
    -z : compress file data
    -h : human-readable, output numbers in a human-readable format
    

    You can read more here as well.