I have more than 300,000 objects in a bucket.
They are under the deep glacier and I would like to restore objects for analysis.
When I tried to benchmark the speed of the restore api
, it seems to be slow for me to process all these objects.
Below is my python code.
paginator = s3.get_paginator('list_objects')
operation_parameters = {'Bucket': bucket,
'Prefix': prefix}
page_iterator = paginator.paginate(**operation_parameters)
cnt = 0
for page in page_iterator:
for content in page['Contents']:
try:
print(content['Key'])
s3.restore_object(
Bucket=bucket,
Key=content['Key'],
RestoreRequest={
'Days': 1,
'GlacierJobParameters': {
'Tier': 'Standard',
},
},
)
except:
print("already restored..")
I found that batch operation is focused on speeding up for this scenario but I don't want to use the aws console.
Rather than using the aws console as GUI interface, I want to code it using python as code above.
Is there more efficient way to restore large objects in python boto? (Not using big data tools such as spark)
Thanks.
Your choice is either loop through the objects and call restore_object()
for each of them, or use S3 Batch Operations to 'bulk restore' them.
Anything you can do in the console can also be done through code.
To create an S3 Batch Operations job, use create_job()
- Boto3 documentation.
If you wish to use your existing code, you could make it run faster by calling restore_object()
in parallel (eg using asyncio
) without waiting for a response, but this requires some advanced Python skills.