Search code examples
databaseamazon-web-servicesamazon-s3getput

Do Amazon S3 GET Object and PUT Object commands slow down at high object counts?


I am considering using S3 for back-end persistent storage.

However, depending on architecture choices, I predict some buckets may need to store billions of small objects.

How will GET Object and PUT Object perform under these conditions, assuming I am using UUIDs as keys? Can I expect O(1), O(logN), or O(n) performance?

Will I need to rethink my architecture and subdivide bigger buckets in some way to maintain performance? I need object lookups (GET) in particular to be as fast as possible.


Solution

  • Though it is probably meant for S3 customers with truly outrageous request volume, Amazon does have some tips for getting the most out of S3, based on the internal architecture of S3:

    • Performing PUTs against a particular bucket in alphanumerically increasing order by key name can reduce the total response time of each individual call. Performing GETs in any sorted order can have a similar effect. The smaller the objects, the more significantly this will likely impact overall throughput.

    • When executing many requests from a single client, use multi-threading to enable concurrent request execution.

    • Consider prefacing keys with a hash utilizing a small set of characters. Decimal hashes work nicely.

    • Consider utilizing multiple buckets that start with different alphanumeric characters. This will ensure a degree of partitioning from the start. The higher your volume of concurrent PUT and GET requests, the more impact this will likely have.

    • If you'll be making GET requests against Amazon S3 from within Amazon EC2 instances, you can minimize network latency on these calls by performing the PUT for these objects from within Amazon EC2 as well.

    Source: http://aws.amazon.com/articles/1904

    Here's a great article from AWS that goes into depth about the hash prefix strategy and explains when it is and isn't necessary:

    http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html

    Bottom line: Your plan to put billions of objects in a single bucket using UUIDs for the keys should be fine. If you have outrageous request volume, you might split it into multiple buckets with different leading characters for even better partitioning.

    If you are going to be spending a lot of money with AWS, consider getting in touch with Amazon and talking through the approach with them.