Search code examples
pythonpython-3.xpython-asyncioaiohttp

aiohttp with asyncio and Semaphores returning a list filled with Nones


I have a script that checks the status code for a couple hundred thousand supplied websites, and I was trying to integrate a Semaphore to the flow to speed up processing. The problem is that whenever I integrate a Semaphore, I just get a list populated with None objects, and I'm not entirely sure why.

I have been mostly copying code from other sources as I don't fully grok asynchronous programming fully yet, but it seems like when I debug I should be getting results out of the function, but something is going wrong when I gather the results. I've tried juggling around my looping, my gathering, ensuring futures, etc, but nothing seems to return a list of things that work.

async def fetch(session, url):
    try:
        async with session.head(url, allow_redirects=True) as resp:
            return url, resp.real_url, resp.status, resp.reason
    except Exception as e:
        return url, None, e, 'Error'


async def bound_fetch(sem, session, url):
    async with sem:
        await fetch(session, url)


async def run(urls):
    timeout = 15
    tasks = []

    sem = asyncio.Semaphore(100)
    conn = aiohttp.TCPConnector(limit=64, ssl=False)

    async with aiohttp.ClientSession(connector=conn) as session:
        for url in urls:
            task = asyncio.wait_for(bound_fetch(sem, session, url), timeout)
            tasks.append(task)

        responses = await asyncio.gather(*tasks)

    # responses = [await f for f in tqdm.tqdm(asyncio.as_completed(tasks), total=len(tasks))]
    return responses

urls = ['https://google.com', 'https://yahoo.com']

loop = asyncio.ProactorEventLoop()
data = loop.run_until_complete(run(urls))

I've commented out the progress bar component, but that implementation returns the desired results when there is no semaphore.

Any help would be greatly appreciated. I am furiously reading up on asynchronous programming, but I can't wrap my mind around it yet.


Solution

  • You should explicitly return results of awaiting coroutines.

    Replace this code...

    async def bound_fetch(sem, session, url):
        async with sem:
            await fetch(session, url)
    

    ... with this:

    async def bound_fetch(sem, session, url):
        async with sem:
            return await fetch(session, url)