Search code examples
pythonlinuxrsyncflask-socketio

How can you get real time copy progress of a large file with Python?


I've searched high and low, and each time I find something that looks promising it's not panned out.

Ultimately I want to grab the real time progress of a file copy on a linux machine from inside python. I'll take that progress and emit it to a client web page with Flask-SocketIO, likely threaded to avoid blocking.

I don't mind if it's rsync, copy, or any other means...(shutil etc) to handle the actual copy. I just want a hook to push an update over the socket.

Thus far I've found this to be the most promising. However, I'm not quite grasping it's console printing mechanism, because when I try to print output to a file, or just a regular Python print, it comes out one character at a time.

import subprocess
import sys

def copy_with_progress(src, dst):
    cmd = 'rsync --progress --no-inc-recursive %s %s'%(src, dst)
    sub_process = subprocess.Popen(cmd, close_fds=True, shell=True, stdout=subproces.PIPE, stderr=subprocess.PIPE)
    while sub_process.poll() is None:
        out = sub_process.stdout.read(1)
        sys.stdout.write(out)
        sys.stdout.flush()


src = '/home/user/Downloads/large_file.tar'
dst = '/media/usbdrive/large_file.tar'

copy_with_progress(src, dst)

Which came from this SO question: Getting realtime output using subprocess

However, this reports the output back over stdout. I'd like capture this output in a variable and emit it.

The stdout progress looks like this, with one line being updated constantly: large_file.tar 323,780,608 19% 102.99MB/s 0:00:12 When I print the variable named 'out' I get a single character that prints to the screen cycling a new line over and over.

How do I capture this info in a way that's useable for transmitting to client side?

Is there a way to grab the entire line for each refresh of the status?


Solution

  • What I've done in the past is to copy the data in chunks and use a callback function to monitor the progress. Something like:

    # Python_2
    
    def copy_with_callback(sourceFile, destinationFile, callbackFunction):
        chunk = 4*1024
        sourceSize = os.path.getsize(sourceFile)
        destSize = 0
        with open(sourceFile, 'rb') as fSrc:
            with open(destinationFile, 'wb') as fDest:
                data = fSrc.read(chunk)
                if len(data) == 0:
                    break
                fDest.write(data)
                destSize += len(data)
                callbackFunction(sourceSize, destSize)
    
    def example_callback_function(srcSize, dstSize):
        ''' Just an example with print.  Your viewer code will vary '''
        print 'Do something with these values:', srcSize, dstSize
        print 'Percent?', 100.0 * dstSize / srcSize
    
    def main():
        src = '/tmp/A/path/to/a/file.txt'
        dest = '/tmp/Another/path/to/a/file.txt'
        copy_with_callback(src, dest, example_callback_function)
    

    An advantage is this python code doesn't depend upon OS specific functionality.