My goal is to convert all files that are currently GLACIER storage class to STANDARD using aws cli s3api
. In order to do this, I need to first get a list of all these files, then fire a restore
command, and eventually a copy
command to change them all to STANDARD.
The problem is, the number of files are too large (~ 5 million), which eventually results in core dump segmentation fault
error if the max item exceeds 600k to 700k. If I don't supply the --max-item
parameter, I would get the same error. So I couldn't get anymore files below 700k threshold. Here's the command I used:
aws s3api list-objects --bucket my-bucket --query 'Contents[?StorageClass==`GLACIER`]' --max-item 700000 > glacier.txt
Is there any workaround?
So I discovered the option --starting-token
from list-objects
command. So I wrote a script to scan all items in batch of 100k objects. This script will output a file containing the S3 key of all GLACIER object.
#!/bin/bash
BUCKET="s3-bucket-name"
PREFIX="foldername"
PROFILE="awscliprofile"
MAX_ITEM=100000
var=0
NEXT_TOKEN=0
while true; do
var=$((var+1))
echo "Iteration #$var - Next token: $NEXT_TOKEN"
aws s3api list-objects \
--bucket $BUCKET \
--prefix $PREFIX \
--profile $PROFILE \
--max-item $MAX_ITEM \
--starting-token $NEXT_TOKEN > temp
awk '/GLACIER/{getline; print}' temp >> glacier.txt
NEXT_TOKEN=$(cat temp | grep NextToken | awk '{print $2}' | sed 's/\("\|",\)//g')
if [ ${#NEXT_TOKEN} -le 5 ]; then
echo "No more files..."
echo "Next token: $NEXT_TOKEN"
break
rm temp
fi
rm temp
done
echo "Exiting."
After that I can use restore-object
and finally copy-object
to change the storage class of all these files to STANDARD. See more scripts here. Hope this helps anyone who needs to achieve the same thing.