python multithreading web-services flask tornado

Flask and/or Tornado - handling time consuming call to external webservice

I've got a flask app that connects with given URL to external services (with different, but usually long response times) and searches for some stuff there. After that there's some CPU heavy operations on the retrieved data. This take some time too.

My problem: response from external may take some time. You can't do much about it, but it becomes a big problem when you have multiple requests at once - flask request to external service blocks the thread and the rest is waiting.

Obvious waste of time and it's killing the app.

I heard about this asynchonous library called Tornado. And there are my questions:

Does that mean it can manage to handle multiple reqests and just trigger callback right after response from external?
Can I achieve that with my current flask app (probably not because of WSGI I guess?) or maybe I need to rewrite the whole app to Tornado?
What about those CPU heavy operations - would that block my thread? It's a good idea to do some load balancing anyway, but I'm curious how Tornado handles that.
Possible traps, gotchas?

Solution

The web server built into flask isn't meant to be used in production, for exactly the reasons you're listing - it's single threaded, and easily bogged down if any request blocking for a non-trivial amount of time. The flask documentation lists several options for deploying it in a production environment; mod_wsgi, gunicorn, uSWGI, etc. All of those deployment options provides mechanisms for handling concurrency, either via threads, processes, or non-blocking I/O. Note, though, that if you're doing CPU-bound operations, the only option that will give true concurrency is to use multiple processes.

If you want to use tornado, you'll need to rewrite your application in the tornado style. Because its architecture based on explicit asynchronous I/O, you can't use its asynchronous features if you deploy it as a WSGI application. The "tornado style" basically means using non-blocking APIs for all I/O operations, and using sub-processes for handling any long-running CPU-bound operations. The tornado documentation covers how to make asynchronous I/O calls, but here's a basic example of how it works:

from tornado import gen

@gen.coroutine
def fetch_coroutine(url):
    http_client = AsyncHTTPClient()
    response = yield http_client.fetch(url)
    return response.body

The response = yield http_client.fetch(curl) call is actually asynchronous; it will return control to the tornado event loop when the requests begins, and will resume again once the response is received. This allows multiple asynchronous HTTP requests to run concurrently, all within one thread. Do note though, that anything you do inside of fetch_coroutine that isn't asynchronous I/O will block the event loop, and no other requests can be handled while that code is running.

To deal with long-running CPU-bound operations, you need to send the work to a subprocess to avoid blocking the event loop. For Python, that generally means using either multiprocessing or concurrent.futures. I'd take a look at this question for more information on how best to integrate those libraries with tornado. Do note that you won't want to maintain a process pool larger than the number of CPUs you have on the system, so consider how many concurrent CPU-bound operations you expect to be running at any given time when you're figuring out how to scale this beyond a single machine.

The tornado documentation has a section dedicated to running behind a load balancer, as well. They recommend using NGINX for this purpose.