Search code examples
pythonamazon-s3boto3

Copy a large amount of files in s3 on the same bucket


I got a "directory" on a s3 bucket with 80 TB ~ and I need do copy everything to another directory in the same bucket

source = s3://mybucket/abc/process/

destiny = s3://mybucket/cde/process/

I already tried to use aws s3 sync, but worked only for the big files, still left 50 TB to copy. I'm thinking about to use a boto3 code as this example below, but I don't know how to do for multiple files/directories recursively.

s3 = boto3.resource('s3')
copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')

How can I do this using the boto3?


Solution

  • While there may be better ways of doing this using bucket policies, it can be done using boto3. first you will need to get a list of the contents of the bucket

    bucket_items =   s3_client.list_objects_v2(Bucket=source_bucket,Prefix=source_prefix)
    bucket_contents = bucket_items.get('Contents',[])
    

    Where source_bucket is the name of the bucket and source_prefix is the name of the folder.
    Next you will iterate over the contents and for each item call the s3.meta.client.copy method like so

    for content in bucket_contents:
            copy_source = {
                'Bucket': source_bucket,
                'Key': content['Key']
            }
            s3.meta.client.copy(copy_source, source_bucket, destination_prefix + '/' + content['Key'].split('/')[-1])
    

    the contents are a dictionary so you must use 'Key' to get the name of the item and use split to break it into prefix and file name.