I need to copy all files from one prefix in S3 to another prefix within the same bucket. My solution is something like:
file_list = [List of files in first prefix]
for file in file_list:
copy_source = {'Bucket': my_bucket, 'Key': file}
s3_client.copy(copy_source, my_bucket, new_prefix)
However I am only moving 200 tiny files (1 kb each) and this procedure takes up to 30 seconds. It must be possible to do it fasteer?
I would do it in parallel. For example:
from multiprocessing import Pool
file_list = [List of files in first prefix]
print(objects_to_download)
def s3_coppier(s3_file):
copy_source = {'Bucket': my_bucket, 'Key': s3_file}
s3_client.copy(copy_source, my_bucket, new_prefix)
# copy 5 objects at the same time
with Pool(5) as p:
p.map(s3_coppier, file_list)