AWS S3 - How to get all files that are GLACIER storage class

My goal is to convert all files that are currently GLACIER storage class to STANDARD using aws cli s3api. In order to do this, I need to first get a list of all these files, then fire a restore command, and eventually a copy command to change them all to STANDARD.

The problem is, the number of files are too large (~ 5 million), which eventually results in core dump segmentation fault error if the max item exceeds 600k to 700k. If I don't supply the --max-item parameter, I would get the same error. So I couldn't get anymore files below 700k threshold. Here's the command I used:

aws s3api list-objects --bucket my-bucket --query 'Contents[?StorageClass==`GLACIER`]' --max-item 700000 > glacier.txt

Is there any workaround?


  • So I discovered the option --starting-token from list-objects command. So I wrote a script to scan all items in batch of 100k objects. This script will output a file containing the S3 key of all GLACIER object.

    while true; do
        echo "Iteration #$var - Next token: $NEXT_TOKEN"
        aws s3api list-objects \
        --bucket $BUCKET \
        --prefix $PREFIX \
        --profile $PROFILE \
        --max-item $MAX_ITEM \
        --starting-token $NEXT_TOKEN > temp
        awk '/GLACIER/{getline; print}' temp >> glacier.txt
        NEXT_TOKEN=$(cat temp | grep NextToken | awk '{print $2}' | sed 's/\("\|",\)//g')
        if [ ${#NEXT_TOKEN} -le 5 ]; then
            echo "No more files..."
            echo "Next token: $NEXT_TOKEN"
            rm temp
        rm temp
    echo "Exiting."

    After that I can use restore-object and finally copy-object to change the storage class of all these files to STANDARD. See more scripts here. Hope this helps anyone who needs to achieve the same thing.