Search code examples
pythonparallel-processingtornadocoroutine

run function parallel on python tornado


I'm currently developing in python3 (still beginner) on a tornado framework and I have a function which I would like to run in the background. To be more precise the task of the function is to download a big file (chunk by chunk) and probably do some more things after each chunk is downloaded. But the calling function should not wait for the download-function to complete but should rather continue execution.

Here some code examples:

@gen.coroutine
def dosomethingfunc(self, env):
    print("Do something")

    self.downloadfunc(file_url, target_path) #I don't want to wait here

    print("Do something else")


@gen.coroutine
def downloadfunc(self, file_url, target_path):

    response = urllib.request.urlopen(file_url)
    CHUNK = 16 * 1024

    with open(target_path, 'wb') as f:
        while True:
            chunk = response.read(CHUNK)
            if not chunk:
                break
            f.write(chunk)
            time.sleep(0.1) #do something after a chunk is downloaded - sleep only as example

I've read this answer on stackoverflow https://stackoverflow.com/a/25083098/2492068 and tried use it.

Actually I thought if I use @gen.coroutine but no yield the dosomethingfunc would continue without waiting for downloadfunc to finish. But actually the behaviour is the same (with yield or without) - "Do something else" will only be printed after downloadfunc finished the download.

What I'm missing here?


Solution

  • To benefit of Tornado's asynchronous there must be yielded a non-blocking function at some point. Since the code of downloadfunc is all blocking, the dosomethingfunc does not get back control until called function is finished.

    There are couples issue with your code:

    So the downloadfunc could look like:

    @gen.coroutine
    def downloadfunc(self, file_url, target_path):
    
        client = tornado.httpclient.AsyncHTTPClient()
    
        # below code will start downloading and
        # give back control to the ioloop while waiting for data
        res = yield client.fetch(file_url)
    
        with open(target_path, 'wb') as f:
            f.write(res)
            yield tornado.gen.sleep(0.1)
    

    To implement it with streaming (by chunk) support, you might want to do it like this:

    # for large files you must increase max_body_size
    # because deault body limit in Tornado is set to 100MB
    
    tornado.web.AsyncHTTPClient.configure(None, max_body_size=2*1024**3)
    
    @gen.coroutine
    def downloadfunc(self, file_url, target_path):
    
        client = tornado.httpclient.AsyncHTTPClient()
    
        # the streaming_callback will be called with received portion of data
        yield client.fetch(file_url, streaming_callback=write_chunk)
    
    def write_chunk(chunk):
        # note the "a" mode, to append to the file
        with open(target_path, 'ab') as f:
            print('chunk %s' % len(chunk))
            f.write(chunk)
    

    Now you can call it in dosomethingfunc without yield and the rest of the function will proceed.

    edit

    Modifying the chunk size is not supported (exposed), both from server and client side. You may also look at https://groups.google.com/forum/#!topic/python-tornado/K8zerl1JB5o