Search code examples
pythonpython-3.xaiohttppython-asyncio

socket.gaierror while using aiohttp


I am trying to fetch titles from multiple domains. So im wrote this code:

import aiohttp
import asyncio
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
    'Accept-Encoding': ', '.join(('gzip', 'deflate', 'br')),
    'Accept': '*/*',
    'Connection': 'keep-alive'
}


async def fetch(url, session):
    async with session.get(f'http://{url}') as response:
        text = await response.text()
        return url, BeautifulSoup(text, 'lxml').title.string


async def main():
    async with asyncio.Semaphore(50):
        async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False), timeout=aiohttp.ClientTimeout(10),
                                         headers=headers) as session:
            titles = await asyncio.gather(*[fetch(domain, session) for domain in domains[0:500]],
                                          return_exceptions=True)
            for title in titles:
                print(title)


if __name__ == '__main__':
    domains = []
    with open('input', 'r') as f:
        for line in f:
            domains.append(line.rstrip())

    asyncio.run(main())

It works, but sometimes throw error like this

Task exception was never retrieved
future: <Task finished name='Task-1635' coro=<TCPConnector._resolve_host() done, defined at /venv/lib/python3.8/site-packages/aiohttp/connector.py:774> exception=gaierror(8, 'nodename nor servname provided, or not known')>
Traceback (most recent call last):
  File "/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 829, in _resolve_host
    addrs = await \
  File "/venv/lib/python3.8/site-packages/aiohttp/resolver.py", line 29, in resolve
    infos = await self._loop.getaddrinfo(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 817, in getaddrinfo
    return await self.run_in_executor(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 914, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
Task exception was never retrieved
future: <Task finished name='Task-1617' coro=<TCPConnector._resolve_host() done, defined at /venv/lib/python3.8/site-packages/aiohttp/connector.py:774> exception=gaierror(8, 'nodename nor servname provided, or not known')>
Traceback (most recent call last):
  File "/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 829, in _resolve_host
    addrs = await \
  File "/venv/lib/python3.8/site-packages/aiohttp/resolver.py", line 29, in resolve
    infos = await self._loop.getaddrinfo(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 817, in getaddrinfo
    return await self.run_in_executor(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 914, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

Sometimes it throw more errors, sometimes less. Can anyone explain why it throw it? I tried to wrap get method in try: except: construction just like this, but it still not working.

async def fetch(url, session):
    async with session.get(f'http://{url}') as response:
        try:
            text = await response.text()
            return url, BeautifulSoup(text, 'lxml').title.string
        except BaseException as e:
            return e

Solution

  • For a temporary fix (XREF: https://github.com/aio-libs/aiohttp/issues/3549#issuecomment-1359137365), have you tried

    ulimit -n unlimited
    

    It can be made to persist in other ways, but this should work as a temporary (and per-shell) fix.

    Try it!