Search code examples
pythonamazon-web-servicesamazon-s3boto3parquet

Rename files while copying files between cross account s3 buckets


I am copying multiple parquet files between cross account s3 buckets. When I am copying them to the destination bucket I want to rename the files.

import boto3
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')

bucket = 'sourcebucket'
folder_path = 'source_folder/'

resp = s3_client.list_objects(Bucket=bucket, Prefix=folder_path)
keys = []
for obj in resp['Contents']:
    keys.append(obj['Key'])


for key in keys:
    copy_source ={
        'Bucket': 'sourcebucket',
        'Key': key
    }
    file_name = key.split('/')[-1]
     s3_file = 'dest_folder/'+'xyz'+file_name
    bucketdest = s3_resource.Bucket('destinationbucket')
    bucketdest.copy(copy_source,s3_file,ExtraArgs={'GrantFullControl':'id = " "'})

This is what I have tried. I can see the files in my destination bucket with the new name but they have no actual data.

Thanks!


Solution

  • Your code is working perfectly fine for me! (However, I ran it without the ExtraArgs since I didn't have an ID.)

    When I copy objects between buckets, the rules I use are:

    • If possible, 'pull' the files from the different account
    • If 'pushing' files to another account, I set ExtraArgs={'ACL':'bucket-owner-full-control'}

    I doubt this small change would have impacted the contents of the your objects.

    By the way, it might be a good idea to use either Client methods or Resource methods. Mixing them can lead to confusion in code and potential problems.

    So, you could use something like:

    Client method:

    response = s3_client.list_objects(Bucket=bucket, Prefix=source_prefix)
    
    for object in response['Contents']:
        copy_source ={
            'Bucket': source_bucket,
            'Key': object['Key']
        }
        s3_client.copy_object(
            Bucket = target_bucket,
            Key = 'dest_folder/' + 'xyz' + key.split('/')[-1],
            CopySource = copy_source,
            ACL = 'bucket-owner-full-control'
        )
    

    or you could use:

    Resource method:

    for object in s3_resource.Bucket(source_bucket).objects.Filter(Prefix=source_prefix):
        copy_source ={
            'Bucket': source_bucket,
            'Key': object.key
        }
        s3_resource.Bucket(target_bucket).copy(
            CopySource = copy_source,
            Key = 'dest_folder/' + 'xyz' + key.split('/')[-1],
            ExtraArgs={'ACL':'bucket-owner-full-control'}
        )
    

    (Warning: I didn't test those snippets.)