Note: Future readers be aware, this question was old, formatted and programmed in a rush. The answer given may be useful, but the question and code probably not.
Hello everyone,
I'm having trouble understanding asyncio and aiohttp and making both work together. Because I don't understand what I'm doing I've run into a problem that I have no idea how to solve.
I'm using Windows 10 64 bits.
The following code returns a list of pages that do not contain "html" in the Content-Type header. It's implemented using asyncio.
import asyncio
import aiohttp
MAXitems = 30
async def getHeaders(url, session, sema):
async with session:
async with sema:
async with session.head(url) as response:
if "html" in response.headers["Content-Type"]:
return url, True
return url, False
return url, False
return url, False
def check_urls_without_html(list_of_urls):
headers_without_html = set()
while(len(list_of_urls) != 0):
blockurls = []
items = 0
for num in range(0, len(list_of_urls)):
if num < MAXitems:
blockurls.append(list_of_urls[num - items])
list_of_urls.remove(list_of_urls[num - items])
items += 1
loop = asyncio.get_event_loop()
semaphoreHeaders = asyncio.Semaphore(50)
session = aiohttp.ClientSession()
data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
for header in data:
if not header[1]:
return headers_without_html
list_of_urls= ['', '']
headers_without_html = check_urls_without_html(list_of_urls)
for header in headers_without_html:
When I run it with too many URLs (ie 2000) sometimes it returns an error like like this one:
data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\", line 454, in run_until_complete
File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\", line 421, in run_forever
File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\", line 1390, in _run_once
event_list =
File "USER\AppData\Local\Programs\Python\Python36-32\lib\", line 323, in select
r, w, _ = self._select(self._readers, self._writers, [], timeout)
File "USER\AppData\Local\Programs\Python\Python36-32\lib\", line 314, in _select
r, w, x =, w, w, timeout)
ValueError: too many file descriptors in select()
I've read that problem arises from a Windows' restriction. I've also read there is not much that can be done about it, other than trying to use less file descriptors.
I've seen people push thousands of requests with asyncio and aiohttp but even with my chuncking I can't push 30-50 without getting this error.
Is there something fundamentally wrong with my code or is it an inherent problem with Windows? Can it be fixed? Can one increase the limit on the maximum number of allowed file descriptors in select?
By default Windows can use only 64 sockets in asyncio loop. This is a limitation of underlying select() API call.
To increase the limit please use ProactorEventLoop
, you can use the code below. See the full docs here here.
if sys.platform == 'win32':
loop = asyncio.ProactorEventLoop()
Another solution is to limit the overall concurrency using a sempahore, see the answer provided here. For example, when doing 2000 API calls you might want not want too many parallel open requests (they might timeout / more difficult to see the individual calling times). This will give you
await gather_with_concurrency(100, *my_coroutines)