Search code examples
amazon-web-servicesdatabase-designamazon-dynamodbnosqldynamodb-queries

DynamoDB Index Design for Finding Items Where an Attribute Contains a Specific Value


Problem

I am currently in the process of designing the database for a software application which will be used by multiple organizations to each manage multiple projects. I am planning on using DynamoDB for the database. The question I have centres around one query I need to perform on the "Projects" table. The queries I will need to perform are as follows:

  1. Find Projects by organization_id.
  2. Find Project by id.
  3. Find Projects by status.
  4. Find Projects by members.

Note: Each Project entity will have an organization_id, id, status, and array of members (user id's).

The problem I am trying to solve is with query #4. Each project can have multiple members, and when a user logs into their dashboard, I want to fetch and display only the projects which that user is a part of. The issue is that the "members" field in a project entity is an array with user id's, which means the "members" of one project could be the same as another project, meaning it won't guarantee uniqueness.

One of the indexes I will use for this table will use the organization_id as the Partition Key and the id as the Sort Key. I would like to do something similar for the index I will use for query #4, but since "members" is not unique within its partition, I cannot use it as the Sort Key.

Possible Solutions

1 - User to Project Table

This solution would require me to create an extra table. This table would relate each user id to an array of project id's. When a user loads their dashboard, I could fetch their user entry for this table and do a Batch Item request and get all the projects by their id's in the array.

Pros:

  • Lower overhead than a Query with Filter Expression (I think).

Cons:

  • Requires an entire extra table.
  • Added complexity to update multiple tables when a user is added/removed from a project.
  • Will require a batch item request to get multiple projects by id, increasing required processing.

2 - Query with Filter Expression

For this solution, I would do a Query with organization_id as the Partition Key with a Filter set to find items where "members" CONTAINS the user id I am looking for.

Pros:

  • No extra table needed.

Cons:

  • Filter expressions apply only after the base query is made, meaning the query will have the same overhead whether or not the filter is applied. For every request made to this endpoint, I will essentially be getting all projects with a specific organization_id, and then filtering over them to find the right ones (not efficient).
  • Following the previous point, Queries only return up to a maximum of 1MB of data. Since the filter is applied after the original Query, this means that if the total number of Projects in an organization exceeds the 1MB limit per request, I could receive nothing back from a Query after my filter expression even if there exists a matching Project in the database, meaning I would have to do another request with pagination until I find it.

3 - Other

I am hoping there exists another solution where I can perform a single request and get all the data I need. I see a Z-index as a possible solution but haven't fully figured out the best way to make use of it.

Any Help Appreciated

Does anyone have experience with a problem like this? I know this is a common data model so I imagine there must be a good way to do this, but I have not been unable to find a way that seems optimal. All comments are welcome. Thanks in advance!


Solution

  • Typically you'd map that like the following:

    PK SK Data OrgId
    project123 project123 all project data org0001
    project123 user001 any specific user data for this project
    project123 user002 any specific user data for this project
    project123 user099 any specific user data for this project
    project999 project999 all project data org0346

    Then you can create a GSI with SK as the partition key and that would give you all projects related to a user. Then you can do a BatchGetItem to obtain all the project metadata from the base table.

    An alternative approach is to overload your GSI so that you only have to make one index to supply you both with all projects for a given user or org.

    PK SK Data GSI1PK
    project123 project123 all project data org0001
    project123 user001 any specific user data for this project user001
    project123 user002 any specific user data for this project user002
    project123 user099 any specific user data for this project user099
    project999 project999 all project data org0346

    With this data model you can use GSI1PK as your index partition key. And you can supply either a userId or OrgId as partition key value depending on your lookup.