Search code examples
pythonamazon-s3aiohttpbotocore

aiobotocore-aiohttp - Get S3 file content and stream it in the response


I want to get the content of an uploaded file on S3 using botocore and aiohttp service. As the files may have a huge size:

  • I don't want to store the whole file content in memory,
  • I want to be able to handle other requests while downloading files from S3 (aiobotocore, aiohttp),
  • I want to be able to apply modifications on the files I download, so I want to treat it line by line and stream the response to the client

For now, I have the following code in my aiohttp handler:

import asyncio                                  
import aiobotocore                              

from aiohttp import web                         

@asyncio.coroutine                              
def handle_get_file(loop):                      

    session = aiobotocore.get_session(loop=loop)

    client = session.create_client(             
        service_name="s3",                      
        region_name="",                         
        aws_secret_access_key="",               
        aws_access_key_id="",                   
        endpoint_url="http://s3:5000"           
    )                                           

    response = yield from client.get_object(    
        Bucket="mybucket",                      
        Key="key",                              
    )                                           

Each time I read one line from the given file, I want to send the response. Actually, get_object() returns a dict with a Body (ClientResponseContentProxy object) inside. Using the method read(), how can I get a chunk of the expected response and stream it to the client ?

When I do :

for content in response['Body'].read(10):
    print("----")                        
    print(content)          

The code inside the loop is never executed.

But when I do :

result = yield from response['Body'].read(10)

I get the content of the file in result. I am a little bit confused about how to use read() here.

Thanks


Solution

  • it's because the aiobotocore api is different than the one of botocore , here read() returns a FlowControlStreamReader.read generator for which you need to yield from

    it looks something like that (taken from https://github.com/aio-libs/aiobotocore/pull/19)

    resp = yield from s3.get_object(Bucket='mybucket', Key='k')
    stream = resp['Body']
    try:
        chunk = yield from stream.read(10)
        while len(chunk) > 0:
          ...
          chunk = yield from stream.read(10)
    finally:
      stream.close()
    

    and actually in your case you can even use readline()

    https://github.com/KeepSafe/aiohttp/blob/c39355bef6c08ded5c80e4b1887e9b922bdda6ef/aiohttp/streams.py#L587