Search code examples
pythonamazon-web-servicesamazon-s3boto3paginator

Paginator in Python


The code below downloads files from an S3 bucket to a local directory.

import boto3

s3_client = boto3.client('s3')

response = s3_client.list_objects_v2(Bucket='MY-BUCKET', Prefix='foo/')
objects = sorted(response['Contents'], key=lambda obj: obj['LastModified'])

## Latest object
latest_object = objects[-1]['Key']
filename = latest_object[latest_object.rfind('/')+1:] # Remove path

# Download it to current directory
s3_client.download_file('MY-BUCKET', latest_object, filename)

The list_objects_v2 command only returns a maximum of 1000 objects. I'm aware paginator could be a solution for this, since the bucket in use has more objects. How can this be implemented in the above?


Solution

  • There is a built-in class that you can use class S3.Paginator.ListObjectsV2

    https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Paginator.ListObjectsV2

    Here is how you can add paginator into your current code.

    import boto3
    
    s3_client = boto3.client('s3')
    # Add paginator
    paginator = s3_client.get_paginator('list_objects_v2')
    # Use pagination
    response = paginator.paginate(Bucket='MY-BUCKET', Prefix='foo/')
    
    data = []
    for r in response:
        data += [c for c in r['Contents']]
    
    print(data)