Tornado server sending file to remote client blocks server

So I want to use Tornado to implement a simple file download server. Here's the code I currently have:

class downloadHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
        global data
        self.write(data)
        self.flush()
        self.finish()

def get_buf(path):
    buf = bytearray(os.path.getsize(path))
    with open(path, 'rb') as f:
        f.readinto(buf)
    return bytes(buf)

if __name__ == '__main__':
    data = get_buf('path/to/file')
    tornado.options.parse_command_line()
    app = tornado.web.Application(handlers=[(r"/upgrade", downloadHandler)])
    http_server = tornado.httpserver.HTTPServer(app)
    http_server.listen(options.port)
    tornado.ioloop.IOLoop.instance().start()

As you can see, I read the file into a bytearray and convert it to a bytes object, this is only done once before starting the server. All I do is just write the data to remote client. The file size is like 2MB. I test it with 'siege', the results looks like the clients are receiving their data one after another, not in a concurrent manner.

siege http://xxx.xx.xx.xx:8000/upgrade -c5 -r1
** SIEGE 4.0.2
** Preparing 5 concurrent users for battle.
The server is now under siege...
HTTP/1.1 200    20.22 secs: 1969682 bytes ==> GET  /upgrade
HTTP/1.1 200    34.24 secs: 1969682 bytes ==> GET  /upgrade
HTTP/1.1 200    48.24 secs: 1969682 bytes ==> GET  /upgrade
HTTP/1.1 200    62.24 secs: 1969682 bytes ==> GET  /upgrade
HTTP/1.1 200    76.25 secs: 1969682 bytes ==> GET  /upgrade

I checked the tornado document, I think the self.write() is a non-blocking call, and I called self.flush(). So what blocks the server ?

I've also tried tornado.web.StaticFileHandler, the result are almost the same.

PS: Is tornado the right tool here ? If no, what are some other alternative ways to achieve my needs ?

Solution

Try writing the data in chunks, perhaps of 256 kilobytes, instead of all at once:

class downloadHandler(tornado.web.RequestHandler):
    async def get(self):
        chunk_size = 256 * 1024
        for i in range(0, len(data), chunk_size):
            self.write(bytes(data[i:i + chunk_size]))
            await self.flush()

        self.finish()

def get_buf(path):
    buf = bytearray(os.path.getsize(path))
    with open(path, 'rb') as f:
        f.readinto(buf)
    return buf

Writing in chunks allows Tornado's IOLoop to regain control between writes (that's what "await" does), and thus do several things concurrently.

Note that you can save some data-copying by keeping the data as a bytearray instead of transforming to bytes.

My code is Python 3.5+. In 2.7, do:

@gen.coroutine
def get(self):
    chunk_size = 256 * 1024
    for i in range(0, len(data), chunk_size):
        self.write(bytes(data[i:i + chunk_size]))
        yield self.flush()

    self.finish()