Search code examples
javaamazon-web-servicesamazon-s3aws-java-sdk

How to change AWS S3 V2 Java api's limit of 1000 while listing objects [For Bucket having more than 1 Billion objects]?


I am working on project where I need to download keys from Amazon S3 bucket, which has more than 1 Billion objects. I wrote a code using Java V2 API but it doesn't help as it downloads only 1000 keys at a time. Its takes days to get list of all keys from this bucket. Is there any faster way to get all list of keys.

I have checked other answers related to this topic and it didn't help.


Solution

  • We had the same issue with a large number of objects.

    We followed a pattern timestamp in 10 increments in their object name. It looks like this,

    s3://bucket-name/timestamp/actualobject.extension
    
    Eg.,
    s3://mys3bucket/1506237300/datafile001.json
    

    When you iterate through I have parallel threads running for each timestamp for 15-minute increments and everything was read very fast.

    The key way to solve is to find out the pattern you have used in storing those objects and list the object names based on those patterns.

    Hope it helps.