I'm currently developing in python3
(still beginner) on a tornado
framework and I have a function which I would like to run in the background. To be more precise the task of the function is to download a big file (chunk by chunk) and probably do some more things after each chunk is downloaded. But the calling function should not wait for the download-function to complete but should rather continue execution.
Here some code examples:
@gen.coroutine
def dosomethingfunc(self, env):
print("Do something")
self.downloadfunc(file_url, target_path) #I don't want to wait here
print("Do something else")
@gen.coroutine
def downloadfunc(self, file_url, target_path):
response = urllib.request.urlopen(file_url)
CHUNK = 16 * 1024
with open(target_path, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)
time.sleep(0.1) #do something after a chunk is downloaded - sleep only as example
I've read this answer on stackoverflow https://stackoverflow.com/a/25083098/2492068 and tried use it.
Actually I thought if I use @gen.coroutine
but no yield
the dosomethingfunc
would continue without waiting for downloadfunc
to finish. But actually the behaviour is the same (with yield or without) - "Do something else
" will only be printed after downloadfunc
finished the download.
What I'm missing here?
To benefit of Tornado's asynchronous there must be yielded
a non-blocking function at some point. Since the code of downloadfunc
is all blocking, the dosomethingfunc
does not get back control until called function is finished.
There are couples issue with your code:
time.sleep
is blocking, use tornado.gen.sleep
instead,urlopen
is blocking, use tornado.httpclient.AsyncHTTPClient
So the downloadfunc
could look like:
@gen.coroutine
def downloadfunc(self, file_url, target_path):
client = tornado.httpclient.AsyncHTTPClient()
# below code will start downloading and
# give back control to the ioloop while waiting for data
res = yield client.fetch(file_url)
with open(target_path, 'wb') as f:
f.write(res)
yield tornado.gen.sleep(0.1)
To implement it with streaming (by chunk) support, you might want to do it like this:
# for large files you must increase max_body_size
# because deault body limit in Tornado is set to 100MB
tornado.web.AsyncHTTPClient.configure(None, max_body_size=2*1024**3)
@gen.coroutine
def downloadfunc(self, file_url, target_path):
client = tornado.httpclient.AsyncHTTPClient()
# the streaming_callback will be called with received portion of data
yield client.fetch(file_url, streaming_callback=write_chunk)
def write_chunk(chunk):
# note the "a" mode, to append to the file
with open(target_path, 'ab') as f:
print('chunk %s' % len(chunk))
f.write(chunk)
Now you can call it in dosomethingfunc
without yield
and the rest of the function will proceed.
edit
Modifying the chunk size is not supported (exposed), both from server and client side. You may also look at https://groups.google.com/forum/#!topic/python-tornado/K8zerl1JB5o