Search code examples
pythonweb-scrapingaiohttp

Error when requesting via aio http AttributeError: the 'NoneType' object does not have the 'connect' attribute


I am writing an asynchronous parser http://freelance.habr.com where I iterate through all the tasks for a specific query. I use aio http for requests. During the execution, only some tasks give an error.

Code:

async def get_data_from_habr() -> None:
    """Main function, adding info from all pages to database."""
    p = 1
    async with aiohttp.ClientSession() as session:
        while True:
            url = f"https://freelance.habr.com/tasks?categories=development_bots&page={p}"
            headers = {"user-agent": UserAgent().random}

            # async with aiohttp.ClientSession() as session:
            async with session.get(url=url, headers=headers) as res:
                src = await res.text()

            soup = BeautifulSoup(src, "lxml")

            if soup.find(class_="empty-block__title"):
                break

            orders = soup.find_all(class_="task__title")
            async_tasks = []
            for order in orders:
                order_url = "https://freelance.habr.com" + order.find("a")["href"]
                async_tasks.append(
                    asyncio.create_task(get_data_from_habr_order_page(order_url, session))
                )

            p += 1

    await asyncio.gather(*async_tasks)


async def get_data_from_habr_order_page(order_url: str, session: aiohttp.ClientSession) -> None:
    """Functions for getting info from one page"""
    headers = {"user-agent": UserAgent().random}

    async with session.get(url=order_url, headers=headers, timeout=1000) as res:
        src = await res.text()

Error:

Traceback (most recent call last):
  File "\OrderScraper\habr_scraper.py", line 92, in <module>
    asyncio.run(get_data_from_habr())
  File "\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "\OrderScraper\habr_scraper.py", line 39, in get_data_from_habr
    await asyncio.gather(*async_tasks)
  File "\OrderScraper\habr_scraper.py", line 46, in get_data_from_habr_order_page
    async with session.get(url=order_url, headers=headers, timeout=1000) as res:
  File "\OrderScraper\order_scraper\Lib\site-packages\aiohttp\client.py", line 1353, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "\OrderScraper\order_scraper\Lib\site-packages\aiohttp\client.py", line 657, in _request
    conn = await self._connector.connect(
                 ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'connect'

With each subsequent launch, the number of these errors increases. It also seems strange to me that the error specifies the path to Python, and not to the virtual environment. Please explain the reason for this error. If you need additional information, please write in the comments.


Solution

  • You've closed the client session and then tried to wait on the requests to complete when there's obviously no connections at that point.

    Move your gather into the session context:

    async with aiohttp.ClientSession() as session:
        ...
    
        await asyncio.gather(*async_tasks)