Search code examples
shellamazon-s3paginationxmlstarlet

AWS S3 ListBucketResult pagination without authentication?


I'm looking to get a simple listing of all the objects in a public S3 bucket.

I'm aware how to get a listing with curl for upto 1000 results, though I do not understand how to paginate the results, in order to get a full listing. I think marker is a clue.

I do not want to use a SDK / library or authenticate. I'm looking for a couple of lines of shell to do this.


Solution

  • #!/bin/sh
    
    # setting max-keys higher than 1000 is not effective
    s3url=http://mr2011.s3-ap-southeast-1.amazonaws.com?max-keys=1000
    s3ns=http://s3.amazonaws.com/doc/2006-03-01/
    
    i=0
    s3get=$s3url
    
    while :; do
        curl -s $s3get > "listing$i.xml"
        nextkey=$(xml sel -T -N "w=$s3ns" -t \
            --if '/w:ListBucketResult/w:IsTruncated="true"' \
            -v 'str:encode-uri(/w:ListBucketResult/w:Contents[last()]/w:Key, true())' \
            -b -n "listing$i.xml")
        # -b -n adds a newline to the result unconditionally, 
        # this avoids the "no XPaths matched" message; $() drops newlines.
    
        if [ -n "$nextkey" ] ; then
            s3get=$s3url"&marker=$nextkey"
            i=$((i+1))
        else
            break
        fi
    done