Search code examples
pythonpython-2.7asynchronoustornadocoroutine

Hitting multiple APIs at once, tornado and python


I'm trying to make an API that will collect responses from several other API's and combine the results into one response. I want to sent the get requests asynchronously so that it runs faster, but even though I'm using coroutines and yielding, my code still seems to be making each request one at a time. Wondering if maybe it's because I'm using the requests library instead of tornado's AsyncHTTPClient, or because I'm calling self.path_get inside of a loop, or because I'm storing results in an instance variable?

The API's I'm hitting return arrays of JSON objects, and I want to combine them all into one array and write that to the response.

from tornado import gen, ioloop, web
from tornado.gen import Return
import requests


PATHS = [
    "http://firsturl",
    "http://secondurl",
    "http://thirdurl"
]


class MyApi(web.RequestHandler):

    @gen.coroutine
    def get(self):
        self.results = []
        for path in PATHS:
            x = yield self.path_get(path)

        self.write({
            "results": self.results,
        })

    @gen.coroutine
    def path_get(self, path):
        resp = yield requests.get(path)
        self.results += resp.json()["results"]
        raise Return(resp)


ROUTES = [
    (r"/search", MyApi),
]


def run():
    app = web.Application(
        ROUTES,
        debug=True,
    )

    app.listen(8000)

    ioloop.IOLoop.current().start()


if __name__ == "__main__":
    run()

Solution

  • There's many reasons why your code doesn't work. To begin, requests generally blocks the event loop and doesn't let anything else execute. Replace requests with AsyncHTTPClient.fetch. Also, the way you were yielding each request would also make the requests sequential and not concurrently like you thought. Here's an example of how your code could be restructured:

    import json
    from tornado import gen, httpclient, ioloop, web
    
    # ...
    
    class MyApi(web.RequestHandler):
    
        @gen.coroutine
        def get(self):
            futures_list = []
            for path in PATHS:
                futures_list.append(self.path_get(path))
    
            yield futures_list
            result = json.dumps({'results': [x.result() for x in futures_list]})
            self.write(result)
    
        @gen.coroutine
        def path_get(self, path):
            request = httpclient.AsyncHTTPClient()
            resp = yield request.fetch(path)
            result = json.loads(resp.body.decode('utf-8'))
            raise gen.Return(result)
    

    What's happening is we're creating a list of Futures that get returned from gen.coroutine functions and yielding the entire list until the results from the request are available. Then once all the requests are complete, futures_list is iterated and the results are used to create a new list which is appended to a JSON object.