Search code examples
pythonpython-asynciohttpx

Why does asyncio.create_task and asyncio.ensure_future behave differently when creating httpx tasks for gather?


I found an async httpx example where ensure_future works but create_task doesn't, but I can't figure out why. As I've understood that create_task is the preferred approach, I'm wondering what's happening and how I may solve the problem.

I've been using an async httpx example at https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-httpx-and-asyncio:

import asyncio
import httpx
import time

start_time = time.time()

async def get_pokemon(client, url):
        resp = await client.get(url)
        pokemon = resp.json()

        return pokemon['name']
    
async def main():

    async with httpx.AsyncClient() as client:

        tasks = []
        for number in range(1, 151):
            url = f'https://pokeapi.co/api/v2/pokemon/{number}'
            tasks.append(asyncio.ensure_future(get_pokemon(client, url)))

        original_pokemon = await asyncio.gather(*tasks)
        for pokemon in original_pokemon:
            print(pokemon)

asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))

When run verbatim, the code produces the intended result (a list of Pokemon in less than a second). However, replacing the asyncio.ensure_future with asyncio.create_task instead leads to a long wait (which seems to be related to a DNS lookup timing out) and then exceptions, the first one being:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/anyio/_core/_sockets.py", line 186, in connect_tcp
    addr_obj = ip_address(remote_host)
  File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
    raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'pokeapi.co' does not appear to be an IPv4 or IPv6 address

Reducing the range maximum (to 70 on my computer) makes the problem disappear.

I understand https://stackoverflow.com/a/36415477/ as saying that ensure_future and create_task act similarly when given coroutines unless there's a custom event loop, and that create_task is recommended.

If so, why does one of the approaches work while the other fails?

I'm using Python 3.10.5 and httpx 0.23.0.


Solution

  • After more debugging, I've found out that the problem lies elsewhere.

    It appears that httpx doesn't use DNS precaching, so when asked to connect to the same host a bunch of times at once, it'll do a large number of DNS lookups. In turn, that caused the DNS server to fail to respond to requests some of the time.

    As luck would have it, even though I tested many times, the request storm happened to make the DNS fail exactly when I was using create_task but not when I was using ensure_future.

    In short, due to Murphy's law I was mistaking a nondeterministic problem for a deterministic one. However, it seems that httpx can in general be a bit fickle when it comes to DNS requests, for instance as reported at https://github.com/encode/httpx/discussions/2321.