I am having challenges architecting my S3 bucket structure.
My Application
I have several types (roles) of users and each user has different types of PDF documents that will be uploaded to S3. The user will see each document in their dashboard and should be able to view the PDF from the application (ideally by opening in a new tab instead of downloading it). Below is an example:
User Roles
- role_a
- role_b
User Documents (for role_a)
- document_type_a (filename: 0888a5ce)
- document_type_b (filename: c00630fr)
- document_type_c (filename: 2349d1c)
User Documents (for role_b)
- document_type_x (filename: fe294090)
- document_type_y (filename: cad2d3dc)
Each user can have zero or more documents.
My questions:
- What is the most optimal way to design a nested S3 bucket structure?
- The filename will be saved in the database for each user. In addition to this, what other components of the S3 bucket structure should be saved in the database and what components should be derived from the application to optimize uploading and downloading of these PDF documents?
- In the above nested structure, what would be the bucket name and what would be the key of the document?
The simplest structure would be a totally flat storage structure:
- Generate a Unique ID for each object (eg using a GUID function)
- Save the object in S3 with a Key equal to the Unique ID
- Store the Unique ID in a database that maps the object to your user together with metadata such as original filename, dates, permissions, etc.
You could choose to prefix each object with a user identifier, which is useful for debugging or trying to reconstruct content in case of a database failure, but there is no particular performance benefit if you are correctly referencing the database for a list of user files.