Search code examples
amazon-web-servicesamazon-s3aws-sdk

How to retrieve usage on Amazon S3 based on tag?


We are currently implementing a single bucket for all data stored in S3, which contains data from various organizations. Can we retrieve usage data for the 'foo' or 'bar' organizations based on their respective 'foo' or 'bar' tags?

Use case:

We deploy a mobile app for our clients, called 'foo' and 'bar'. This app is used to upload files (images and videos) tagged with 'foo' and 'bar' based on the organization. We use the same API for the app and the same bucket in S3. I want to get metrics like the total storage used by the 'foo' or 'bar' tags, so I can monitor that 'foo' or 'bar' is using N GB of storage.


Solution

  • Tagging at the object-level for Amazon S3 was introduced in March 2017. It allows:

    • Lifecycle Management by tag (eg move to Amazon Glacier)
    • Access control policies

    It does not provide metrics based on tags at the object level.

    You would need to write your own script to retrieve a list of objects and calculate storage based on tag. However, it appears that the only way to retrieve tags on an object is to request for each individual object. This means you would be making large quantities of API calls.

    An alternative is to use object metadata against each object. Metadata is returned by the list-objects API call, so it would only require one API call per 1000 objects (which is the paging size of returned data).

    Finally, you could store the objects in separate buckets, which would make it possible to use Amazon CloudWatch metrics. Amazon S3 sends metrics to Amazon CloudWatch for the number of objects and the amount of storage space.