Search code examples
pythonasynchronouspython-requestspython-asyncioaiohttp

Parallelise web tasks with asyncio in Python


I'm trying to wrap my head around asyncio and aiohttp and for the first time in years programming makes me feel utterly stupid and incapable. Which is kind of beautiful, in a weirdo Zen way. But alas, there's work to get done.

I've got an existing class that can do numerous wondrous things on the web, like signing up to a web site, getting data, the works. And now I need like, 100 or 1000 of these little worker bees to sign up. Code looks roughly like this:

class Worker(object):
    def signup(self, ...):
        ...
        data = self.make_request(url, data)
        self.user_id = data.get("user_id")
        return self

    def make_request(self, url, data):
        response = requests.post(url, data=data)
        return response.json()

workers = [Worker().signup() for n in range(100)]

As you can see, we're using the requests module to make a POST request. However this is blocking, so we'll have to wait for worker N to finish signing up before we start signing up worker N+1. Fortunately, the original author of the Worker class (that sounds charmingly Marxist) in her infinite wisdom wrapped every HTTP call in the self.make_request method, so making the whole Worker non blocking should just be a matter of swapping out the requests library for a non-blocking one aaaaand bob's your uncle, right? This is how far I got:

class AyncWorker(Worker):
    @asyncio.coroutine
    def make_request(self, url, data):
        response = yield from aiohttp.request('post', url, data=data)
        return (yield from response.json())

coroutines = [Worker().signup() for n in range(100)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(coroutines))
loop.close()

But this will raise an AttributeError: 'generator' object has no attribute 'get' in the signup method where I do self.user_id = data.get("user_id"). And beyond that, I still don't have the workers in a neat dictionary. I'm aware that I'm most likely completely misunderstanding how asyncio works - but I already spent a day reading through various docs, mind-shattering tutorials by David Beazly, and masses of toy examples that are simply enough that I understand them and too simple to apply to this situation. How should I structure my worker and my async loop to sign up 100 workers in parallel and eventually get a list of all workers after they signed up?


Solution

  • Once you use the yield (or yield from) in a function, this function becomes a coroutine. It means that you can't get a result by just calling it: you will get a generator object. You must at least do this:

    @asyncio.coroutine
    def some_coroutine(*args):
        #...
        #...
        yield from tasty.asyncio.function()
        return result
    
    def coroutine_user():
        # data = some_coroutine() will give you a generator object instead of result
        data = yield from some_coroutine()
        return data # data here is a plain result: you can call your .get or whatever
    

    Guess what happens when you call coroutine_user():

    >>> coroutine_user()
    <generator object coroutine_user at 0x7fe13b8a47e0>
    

    Lack of async.coroutine decorator doesn't help at all: coroutines are contagious! To get a result in a function, you must use yield from. It turns your function into another coroutine!

    Though things aren't always that bad (usually you can manually iterate a generator object without relying on yield from), asyncio will specifically stop you from doing it: it breaks some internals (you can do it only from Future or asyncio.coroutine). So just use concurrent.futures or something similar unless you're going to turn all your code into coroutines. As some alternative, isolate all users of aiohttp.request from usual methods and work with both coroutine-based async workers and synchronous plain old code. Diving into asyncio and actually refactoring all your code is an option too, obviously: you basically need to put yield from before every call to any infected with asyncio method.