Search code examples
amazon-web-servicesamazon-s3consistency

Is Amazon S3's ListObjectsV2 self-consistent over multiple pages?


ListObjectsV2 can only return 1000 results, at which point you have to go back for another page.

Since Amazon S3 is now strongly consistent, and other updates can be happening to the bucket while I am listing its contents, is the second page going to be more results from the same point in time as the first page? Or is it going to reflect the state of the bucket at the point in time when the second page was requested?

For example, if I list a bucket, get the first page, delete a key which would have appeared on the second page, and then get the second page, will I still see the key that is now deleted?


Solution

  • Indeed, Amazon S3 is now strongly consistent. This means once you upload an object, all people that read that object are guaranteed to get the updated version of the object. This does not meant that two different API calls are guaranteed to be in the same "state". Notably, for downloads, there is a situation where one download can get parts of two versions of the object if it's updated while being downloaded. More details are available in this answer.

    As for you question, the same basic rules apply: S3 is strongly consistent from one call to the next, once you make a change to the bucket or objects, any call after that update is guaranteed to get the updated data. This means as you page through the list of objects, you will see the changes as each API call gets the latest state:

    import boto3
    
    BUCKET='example-bucket'
    PREFIX='so_question'
    
    s3 = boto3.client('s3')
    
    # Create a bunch of items
    for i in range(3000):
        s3.put_object(Bucket=BUCKET, Key=f"{PREFIX}/obj_{i:04d}", Body=b'')
    
    args = {'Bucket': BUCKET, 'Prefix': PREFIX + "/",}
    result = s3.list_objects_v2(**args)
    # This shows objects 0 to 999
    print([x['Key'] for x in result['Contents']])
    
    # Delete an object
    s3.delete_object(Bucket=BUCKET, Key=f"{PREFIX}/obj_{1100:04d}")
    
    # Request the next "page" of items
    args['ContinuationToken'] = result['NextContinuationToken']
    result = s3.list_objects_v2(**args)
    # This will not show object 1100, showing objects 1000 to 2000
    print([x['Key'] for x in result['Contents']])
    

    The upside of this and there's no way to get a list of all objects in a bucket (assuming it has more than 1000 items) in one API call: there's no way I'm aware of to get a complete "snapshot" of the bucket at any point, unless you can ensure the bucket doesn't change during listing the objects, of course.