I want to create a program that can fetch 100's of webpages and return their content. I can do this with a simple python script now:
import requests
urls = [...]
data = []
for url in urls:
content = requests.get(url).content
data.append(content)
However, the downfall to the above implementation is that when in the for loop, content must be loaded before making a request on the next url. What I want to do is avoid this. I want to make one request for each url, but not have to wait for loading the content of the current url to finish. How can I do this? I have read up on aiohttp and threading, but I am not sure what is the best approach.
asyncio + aiohttp is a good combination that will provide a significant performance improvement:
Sample implementation:
import asyncio
import aiohttp
async def fetch(url):
async with aiohttp.ClientSession() as session:
resp = await session.get(url)
content = await resp.text()
return content
async def main():
urls = [...]
webpages = await asyncio.gather(*[fetch(url) for url in urls])
# use webpages for further processing
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()