Let's say I have the following files in an S3 bucket -
All zipped files contain txt files within them with the same name.
I want to unzip the .zip and .gz files and move all the txt files to a different location in the same S3 bucket (say newloc/). The files should only be moved once.
So the files in the destination should look like -
In the above example, abcd.txt was only moved once to newloc/ even though loc/ had both abcd.zip and abcd.txt present.
Fairly new to AWS CLI or AWS in general and not sure how to achieve this. The txt files are 800 in number and about 500MB - 1GB each.
There is no in-built capability in Amazon S3 to unzip files.
You should write a script that lists the files, then loops through them and:
Using Python and the boto3 library would be easier than writing shell script and using the AWS CLI.
You can check whether an object already exists in S3 by using the head_object()
command.