AWS S3 - How to get all files that are GLACIER storage class

My goal is to convert all files that are currently GLACIER storage class to STANDARD using aws cli s3api. In order to do this, I need to first get a list of all these files, then fire a restore command, and eventually a copy command to change them all to STANDARD.

The problem is, the number of files are too large (~ 5 million), which eventually results in core dump segmentation fault error if the max item exceeds 600k to 700k. If I don't supply the --max-item parameter, I would get the same error. So I couldn't get anymore files below 700k threshold. Here's the command I used:

aws s3api list-objects --bucket my-bucket --query 'Contents[?StorageClass==`GLACIER`]' --max-item 700000 > glacier.txt

Is there any workaround?

Solution

So I discovered the option --starting-token from list-objects command. So I wrote a script to scan all items in batch of 100k objects. This script will output a file containing the S3 key of all GLACIER object.

#!/bin/bash
BUCKET="s3-bucket-name"
PREFIX="foldername"
PROFILE="awscliprofile"
MAX_ITEM=100000

var=0
NEXT_TOKEN=0
while true; do

    var=$((var+1))

    echo "Iteration #$var - Next token: $NEXT_TOKEN"

    aws s3api list-objects \
    --bucket $BUCKET \
    --prefix $PREFIX \
    --profile $PROFILE \
    --max-item $MAX_ITEM \
    --starting-token $NEXT_TOKEN > temp

    awk '/GLACIER/{getline; print}' temp >> glacier.txt

    NEXT_TOKEN=$(cat temp | grep NextToken | awk '{print $2}' | sed 's/\("\|",\)//g')
    if [ ${#NEXT_TOKEN} -le 5 ]; then
        echo "No more files..."
        echo "Next token: $NEXT_TOKEN"
        break
        rm temp
    fi
    rm temp
done
echo "Exiting."

After that I can use restore-object and finally copy-object to change the storage class of all these files to STANDARD. See more scripts here. Hope this helps anyone who needs to achieve the same thing.