Search code examples
pythonconcurrencyaiohttppython-asyncio

Correctly catch aiohttp TimeoutError when using asyncio.gather


It is my first question here on Stack Overflow so I apologize if I did something stupid or missed something.

I am trying to make asynchronous aiohttp GET requests to many api endpoints at a time to check the status of these pages: the result should be a triple of the form (url, True, "200") in case of a working link and (url, False, response_status) in case of a "problematic link". This is the atomic function for each call:

async def ping_url(url, session, headers, endpoint):

    try:
        async with session.get((url + endpoint), timeout=5, headers=headers) as response:
            return url, (response.status == 200), str(response.status)
    except Exception as e:
        test_logger.info(url + ": " + e.__class__.__name__)
        return url, False, repr(e)

These are wrapped into a function using asyncio.gather() which also creates the aiohttp Session:

async def ping_urls(urllist, endpoint):

   headers = ... # not relevant

   async with ClientSession() as session:
        try:
            results = await asyncio.gather(*[ping_url(url, session, headers, endpoint) \
                      for url in urllist],return_exceptions=True)
        except Exception as e:
            print(repr(e))
   return results

The whole called from a main that looks like this:

    urls = ... # not relevant
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(ping_urls(urls, endpoint))

    except Exception as e:
        pass
    finally:
        loop.close()

This works most of the time, but if the list is pretty long, I noticed that as soon as I get one

TimeoutError

the execution loop stops and I get TimeoutError for all other urls after the first one that timed out. If I omit the timeout in the innermost function I get somehow better results, but then it is not that fast anymore. Is there a way to control the Timeouts for the single api calls instead of a big general timeout for the whole list of urls?

Any kind of help would be extremely appreciated, I got stuck with my bachelor thesis because of this issue.


Solution

  • You may want to try setting a session timeout for your client session. This can be done like:

    async def ping_urls(urllist, endpoint):
        headers = ... # not relevant
    
        timeout = ClientTimeout(total=TIMEOUT_SECONDS)
        async with ClientSession(timeout=timeout) as session:
            try:
                results = await asyncio.gather(
                   *[
                        ping_url(url, session, headers, endpoint)
                        for url in urllist
                    ],
                    return_exceptions=True
                )
            except Exception as e:
                print(repr(e))
    
            return results
    

    This should set the ClientSession instance to have TIMEOUT_SECONDS as the timeout. Obviously you will need to set that value to something appropriate!