Search code examples
pythonfilesshnetwork-programmingrsync

How to write a string to a file on a remote machine?


On Machine1, I have a Python2.7 script that computes a big (up to 10MB) binary string in RAM that I'd like to write to a disk file on Machine2, which is a remote machine. What is the best way to do this?

Constraints:

  • Both machines are Ubuntu 13.04. The connection between them is fast -- they are on the same network.

  • The destination directory might not yet exist on Machine2, so it might need to be created.

  • If it's easy, I would like to avoid writing the string from RAM to a temporary disk file on Machine1. Does that eliminate solutions that might use a system call to rsync?

  • Because the string is binary, it might contain bytes that could be interpreted as a newline. This would seem to rule out solutions that might use a system call to the echo command on Machine2.

  • I would like this to be as lightweight on Machine2 as possible. Thus, I would like to avoid running services like ftp on Machine2 or engage in other configuration activities there. Plus, I don't understand security that well, and so would like to avoid opening additional ports unless truly necessary.

  • I have ssh keys set up on Machine1 and Machine2, and would like to use them for authentication.

  • EDIT: Machine1 is running multiple threads, and so it is possible that more than one thread could attempt to write to the same file on Machine2 at overlapping times. I do not mind the inefficiency caused by having the file written twice (or more) in this case, but the resulting datafile on Machine2 should not be corrupted by simultaneous writes. Maybe an OS lock on Machine2 is needed?

I'm rooting for an rsync solution, since it is a self-contained entity that I understand reasonably well, and requires no configuration on Machine2.


Solution

  • You open a new SSH process to Machine2 using subprocess.Popen and then you write your data to its STDIN.

    import subprocess
    
    cmd = ['ssh', 'user@machine2',
           'mkdir -p output/dir; cat - > output/dir/file.dat']
    
    p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
    
    your_inmem_data = 'foobarbaz\0' * 1024 * 1024
    
    for chunk_ix in range(0, len(your_inmem_data), 1024):
        chunk = your_inmem_data[chunk_ix:chunk_ix + 1024]
        p.stdin.write(chunk)
    

    I've just verified that it works as advertised and copies all of the 10485760 dummy bytes.

    P.S. A potentially cleaner/more elegant solution would be to have the Python program write its output to sys.stdout instead and do the piping to ssh externally:

    $ python process.py | ssh <the same ssh command>