Search code examples
pythontornado

Downloading large files with tornado AsyncHTTPClient streaming_callback fails


The code below works fine for small files (<100MB or so), but fails to larger ones (uncomment the second URL in line 5 to see the problem). What baffles me is that the failure is immediate, I guess as soon as tornado sees Content-Length header -- but from what I understood, streaming_callback should make it work with arbitrarily-large files.

import tornado, tornado.httpclient

def main():
    url = "https://www.python.org/ftp/python/2.7.13/python-2.7.13.msi"
    # url = "http://releases.ubuntu.com/16.04.1/ubuntu-16.04.1-desktop-amd64.iso?_ga=1.179801251.666251388.1483725275"
    client = tornado.httpclient.AsyncHTTPClient()
    request = tornado.httpclient.HTTPRequest(url=url, streaming_callback=on_chunk)
    client.fetch(request,on_done)

total_data = 0
def on_done(response):
    print total_data
    print response

def on_chunk(chunk):
    global total_data
    total_data += len(chunk)

main()
tornado.ioloop.IOLoop.current().start()

I get:

19161088 HTTPResponse(_body=None,buffer=<_io.BytesIO object at 0x7f7a57563258>,code=200,effective_url='https://www.python.org/ftp/python/2.7.13/python-2.7.13.msi',error=None,headers=,reason='OK',request=,request_time=0.7110521793365479,time_info={})

when downloading with Python, but

0 HTTPResponse(_body=None,buffer=None,code=599,effective_url='http://releases.ubuntu.com/16.04.1/ubuntu-16.04.1-desktop-amd64.iso?_ga=1.179801251.666251388.1483725275',error=HTTP 599: Connection closed,headers=,reason='Unknown',request=,request_time=0.10775566101074219,time_info={})

when trying with Ubuntu...


Solution

  • streaming_callback can work with any size file, but by default AsyncHTTPClient still enforces a limit of 100MB. To increase this, use

    AsyncHTTPClient.configure(None, max_body_size=1000000000)