Search code examples
pythonmultithreadingasynchronouspython-requestsaiohttp

Python3 threads or aiohttp?


I want to create a program that can fetch 100's of webpages and return their content. I can do this with a simple python script now:

import requests

urls = [...]
data = []
for url in urls:
    content = requests.get(url).content
    data.append(content)

However, the downfall to the above implementation is that when in the for loop, content must be loaded before making a request on the next url. What I want to do is avoid this. I want to make one request for each url, but not have to wait for loading the content of the current url to finish. How can I do this? I have read up on aiohttp and threading, but I am not sure what is the best approach.


Solution

  • asyncio + aiohttp is a good combination that will provide a significant performance improvement:

    Sample implementation:

    import asyncio
    import aiohttp
    
    
    async def fetch(url):
        async with aiohttp.ClientSession() as session:
            resp = await session.get(url)
            content = await resp.text()
            return content 
    
    
    async def main():
        urls = [...]
        webpages = await asyncio.gather(*[fetch(url) for url in urls])
        # use webpages for further processing
    
    
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    loop.close()