python multithreading asynchronous couchdb tornado

Optimize Async Tornado code. Minimize the thread lock

How can I minimize the thread lock with Tornado? Actually, I have already the working code, but I suspect that it is not fully asynchronous.

I have a really long task. It consists of making several requests to CouchDB to get meta-data and to construct a final link. Then I need to make the last request to CouchDB and stream a file (from 10 MB up to 100 MB). So, the result will be the streaming of a large file to a client.

The problem that the server can receive 100 simultaneous requests to download large files and I need not to lock thread and keep recieving new requests (I have to minimize the thread lock).

So, I am making several synchronous requests (requests library) and then stream a large file with chunks with AsyncHttpClient.

The questions are as follows:

1) Should I use AsyncHTTPClient EVERYWHERE? Since I have some interface it will take quite a lot of time to replace all synchronous requests with asynchronous ones. Is it worth doing it?

2) Should I use tornado.curl_httpclient.CurlAsyncHTTPClient? Will the code run faster (file download, making requests)?

3) I see that Python 3.5 introduced async and theoretically it can be faster. Should I use async or keep using the decorator @gen.coroutine?

Solution

Use AsyncHTTPClient or CurlAsyncHTTPClient. Since the "requests" library is synchronous, it blocks the Tornado event loop during execution and you can only have one request in progress at a time. To do asynchronous networking operations with Tornado requires purpose-built asynchronous network code, like CurlAsyncHTTPClient.

Yes, CurlAsyncHTTPClient is a bit faster than AsyncHTTPClient, you may notice a speedup if you stream large amounts of data with it.

async and await are faster than gen.coroutine and yield, so if you have yield statements that are executed very frequently in a tight loop, or if you have deeply nested coroutines that call coroutines, it will be worthwhile to port your code.