Search code examples
pythongoogle-cloud-platformpython-asyncio

No speedup using asyncio for uploading a list of files using resumable upload links


I have a list of file paths and their resumable upload links for uploading to GCS buckets. I implemented this normally and then using asyncio, and find no improvements in the speed of execution. Would appreciate any inputs, thanks.

from asyncio import run, gather
import requests 

async def uploadFile(UPLOAD_URL, LOCAL_PATH):
    with open(LOCAL_PATH, 'rb') as f:
        data = f.read()
    requests.put(UPLOAD_URL, data=data)

async def uploadFiles(path_and_url):
    uploads = [uploadFile(dic['upload_url'], dic['local_path']) for dic in path_and_url]
    await gather(*uploads)

run(uploadFiles(path_and_url))

Using async functions in the uploadFile option speeds up my code by about 15%, anything further I can do? Thanks!

import aiohttp, aiofiles

async def uploadFile(UPLOAD_URL, LOCAL_PATH):
    async with aiofiles.open(LOCAL_PATH, 'rb') as f:
        data = await f.read()
    # httpx.put(UPLOAD_URL, data=data)
        async with aiohttp.ClientSession() as session:
            async with session.put(UPLOAD_URL, data=data) as resp:
                print(f"{LOCAL_PATH} -> {resp}")

async def uploadFiles(path_and_url):
    uploads = [uploadFile(dic['upload_url'], dic['local_path']) for dic in path_and_url]
    await gather(*uploads)

Solution

  • All you did is not wrong there are some optimizations I have made it can speed up your code as well. You were creating a new session on every file upload which is not optimal and could spend more time and resources than creating it once and using it for all requests.

    import asyncio
    
    import aiofiles
    import aiohttp
    
    
    async def upload_file(session, upload_url, local_path):
        async with aiofiles.open(local_path, 'rb') as fp:
            file_content = await fp.read()
            response = session.put(upload_url, data=file_content)
    
    
    async def upload_files(paths):
        async with aiohttp.ClientSession() as session:
            await asyncio.gather(*[upload_file(session, **path) for path in paths])
    
    
    async def main():
        await upload_files([
            {'upload_url': 'https://.../', 'local_path': '/home/suraj/Downloads/1.png'},
            {'upload_url': 'https://.../', 'local_path': '/home/suraj/Downloads/2.png'},
            {'upload_url': 'https://.../', 'local_path': '/home/suraj/Downloads/3.png'},
            {'upload_url': 'https://.../', 'local_path': '/home/suraj/Downloads/4.png'},
        ])
    
    
    if __name__ == "__main__":
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())