Search code examples
amazon-web-servicesamazon-s3amazon-cloudfront

Do we need directory structure logic for storing millions of images on Amazon S3/Cloudfront?


In order to support millions of potential images we have previously followed this sort of directory structure:

/profile/avatars/44/f2/47/48px/44f247d4e3f646c66d4d0337c6d415eb.jpg

The filename is md5 hashed, then we extract the first 6 characters in the string and build the folder structure from that.

So in the above example the filename:

44f247d4e3f646c66d4d0337c6d415eb.jpg

produces a directory structure of:

/44/f2/47/

We always did this in order to minimize the number of photos in any single directory, ultimately to aid filesystem performance.

However our new app is using Amazon S3 with Cloudfront

My understanding is that any folders you create on Amazon S3 are actually just references and are not directories on the filesystem.

If that is correct is it still recommended to split into folders/directories in the above, or similar method? Or can we simply remove this complexity in our application code and provide image links like so:

/profile/avatars/48px/filename.jpg

Baring in mind that this app is intended to serve 10's of millions of photos.

Any guidance would be greatly appreciated.


Solution

  • Although S3 folders are basically only another way of writing the key name (as @E.J.Brennan already said in his answer), there are reasons to think about the naming structure of your "folders".

    With your current number of photos and probably your access patterns, it might make sense to think about a way to speed up the S3 keyname lookups, making sure that operations on photos get spread out over multiple partitions. There is a great article on the AWS blog explaining all the details.