I need to write code (python) to copy an S3 file from one S3 bucket to another. The source bucket is in a different AWS account, and we are using an IAM user credentials to read from that bucket. The code runs in the same account as the destination bucket, so it has write access with the IAM role. One way I can think of is to create an s3 client connection with the source account, read the whole file into memory (getObject-?), and then create another s3 client with the destination bucket and write the contents (putObject-?) that have been previously read into memory. But it can get very inefficient if the file size grows, so wondering if there is a better way, preferably if boto3 provides a AWS-managed way that transfers the file without reading contents into memory.
PS: I cannot add or modify roles or policies in the source account to give direct read access to the destination account. The source account is owned by someone else and they only provide a user that can read from the bucket.
Streaming is the standard solution for this kind of problem. You establish a source and a destination and then you stream from one to the other.
In fact, the boto3 get_object()
and upload_fileobj()
methods both support streams.
Your code is going to look something like this:
import boto3
src = boto3.client('s3', src_access_key, src_secret_key)
dst = boto3.client('s3') # creds implicit through IAM role
src_response = src.get_object(Bucket=src_bucket, Key=src_key)
dst.upload_fileobj(src_response['Body'], dst_bucket, dst_key)