I'm attempting to write some asynchronous GET requests with the aiohttp package, and have most of the pieces figured out, but am wondering what the standard approach is when handling the failures (returned as exceptions).
A general idea of my code so far (after some trial and error, I am following the approach here):
import asyncio
import aiofiles
import aiohttp
from pathlib import Path
with open('urls.txt', 'r') as f:
urls = [s.rstrip() for s in f.readlines()]
async def fetch(session, url):
async with session.get(url) as response:
if response.status != 200:
response.raise_for_status()
data = await response.text()
# (Omitted: some more URL processing goes on here)
out_path = Path(f'out/')
if not out_path.is_dir():
out_path.mkdir()
fname = url.split("/")[-1]
async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
await f.write(data)
async def fetch_all(urls, loop):
async with aiohttp.ClientSession(loop=loop) as session:
results = await asyncio.gather(*[fetch(session, url) for url in urls],
return_exceptions=True)
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
results = loop.run_until_complete(fetch_all(urls, loop))
Now this runs fine:
results
variable is populated with None
entries where the corresponding URL [i.e. at the same index in the urls
array variable, i.e. at the same line number in the input file urls.txt
] was successfully requested, and the corresponding file is written to disk.results
not equal to None
)I have looked at a few different guides to using the various asynchronous Python packages (aiohttp
, aiofiles
, and asyncio
) but I haven't seen the standard way to handle this final step.
await
statement has 'finished'/'completed'?(ClientConnectorError(111, "Connect call failed ('000.XXX.XXX.XXX', 443)")
i.e. the request to IP address 000.XXX.XXX.XXX
at port 443
failed, probably because there's some limit from the server which I should respect by waiting with a time out before retrying.Naively, I was expecting run_until_complete
to handle this in such a way that it would finish upon succeeding at requesting all URLs, but this isn't the case.
I haven't worked with asynchronous Python and sessions/loops before, so would appreciate any help to find how to get the results
. Please let me know if I can give any more information to improve this question, thank you!
Should the retrying to send a GET request be done after the await statement has 'finished'/'completed'? ...or should the retrying to send a GET request be initiated by some sort of callback upon failure
You can do the former. You don't need any special callback, since you are executing inside a coroutine, so a simple while
loop will suffice, and won't interfere with execution of other coroutines. For example:
async def fetch(session, url):
data = None
while data is None:
try:
async with session.get(url) as response:
response.raise_for_status()
data = await response.text()
except aiohttp.ClientError:
# sleep a little and try again
await asyncio.sleep(1)
# (Omitted: some more URL processing goes on here)
out_path = Path(f'out/')
if not out_path.is_dir():
out_path.mkdir()
fname = url.split("/")[-1]
async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
await f.write(data)
Naively, I was expecting
run_until_complete
to handle this in such a way that it would finish upon succeeding at requesting all URLs
The term "complete" is meant in the technical sense of a coroutine completing (running its course), which is achieved either by the coroutine returning or raising an exception.