python amazon-web-services amazon-s3 boto3 paginator

Paginator in Python

The code below downloads files from an S3 bucket to a local directory.

import boto3

s3_client = boto3.client('s3')

response = s3_client.list_objects_v2(Bucket='MY-BUCKET', Prefix='foo/')
objects = sorted(response['Contents'], key=lambda obj: obj['LastModified'])

## Latest object
latest_object = objects[-1]['Key']
filename = latest_object[latest_object.rfind('/')+1:] # Remove path

# Download it to current directory
s3_client.download_file('MY-BUCKET', latest_object, filename)

The list_objects_v2 command only returns a maximum of 1000 objects. I'm aware paginator could be a solution for this, since the bucket in use has more objects. How can this be implemented in the above?

Solution

There is a built-in class that you can use class S3.Paginator.ListObjectsV2

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Paginator.ListObjectsV2

Here is how you can add paginator into your current code.

import boto3

s3_client = boto3.client('s3')
# Add paginator
paginator = s3_client.get_paginator('list_objects_v2')
# Use pagination
response = paginator.paginate(Bucket='MY-BUCKET', Prefix='foo/')

data = []
for r in response:
    data += [c for c in r['Contents']]

print(data)