Search code examples
database-designamazon-dynamodbdynamodb-queriesamazon-dynamodb-index

DynamoDB single table design


I'm building an app using DynamoDB as the database. My app has a concept of users, organizations, projects and scripts.

  • An Organization has many projects
  • A project or organization has many scripts
  • A user can be part of a single organization so an organization can have many users.
  • A script can either belong to a project or an organization. Each script has revisions which needs to be store so that users can rollback to previous version.

Use cases the app needs to query:

  • Get all users belonging to a organization
  • Get all projects belonging to a organization
  • Get all scripts belonging to a project
  • Get script by id
  • Get all versions of the script by script id
  • Get latest version of the script by script id
  • Get organization by org id
  • Get project by project id
  • Get script by script id

I was thinking of creating a single table Entities:

pk sk type (gsi1-pk) parent (gsi1-sk)
root#org-1 info org root
org-1#proj-1 info project org-1
proj-1#script-1 info script proj-1
proj-1#script-1 rev_latest script_revision script-1
proj-1#script-1 rev_1 script_revision script-1
proj-1#script-1 rev_2 script_revision script-1
org-1#user-1 info user org-1
org-1#user-2 info user org-1

PK and SK would allow me to get the data of specific items providing the ID. For the queries where I have the id of a parent object and need to fetch the children, I would create a GSI and query against that.

For PK, I could drop the first token before # which is supposed to represent the parent object which is redundant since it's defined in the parent column but I left it there to prevent accidental cross-object updates.

Am I better off with multiple DynamoDB tables to store the info?

I would prefer not using scan queries for economic reasons. Ideally I can get all the data via query and scan operation.


Solution

  • Here's one single-table option. The design satisfies your access patterns using query operations. No scan operations are required.

    pk sk gsi1pk gsi1sk
    org#org-1 xInfo
    proj#proj-1 xInfo org#org-1 proj#proj-1#xInfo
    script#script-1 xInfo proj#proj-1 script#script-1#xInfo
    script#script-1 rev#2023-10-05T22:00:00 proj#proj-1 script#script-1#rev#2023-10-05T22:00:00
    script#script-1 rev#2023-03-13T10:00:00 proj#proj-1 script#script-1#rev#2023-03-13T10:00:00
    script#script-1 rev#2022-12-20T08:00:00 proj#proj-1 script#script-1#rev#2022-12-20T08:00:00
    user#user-1 xInfo org#org-1 user#user-1#xInfo
    user#user-2 xInfo org#org-1 user#user-1#xInfo

    The table primary key models the records. The partition key pk has the format <entity-type>#<entity-id>. I find the entity-type prefix helpful to visually identify a record's entity type, but it is optional here. The sort key sk defines the record type. Scripts have two record types, info and revisions. Giving script revisions a timestamp "id" makes it easy to retrieve all revisions or only the latest revision1. I renamed info to xInfo to move the "info" record after all revisions in lexical ordering2. Why do this? So we can query a script by pk in descending order (ScanIndexForward: false) with a limit of 2 to get a script's info record and latest revision in a single operation. No limit returns all revisions and the info record.

    Use the table primary key to answer all the "by id" queries. We need an index to handle the "belongs to" queries.

    The secondary index partition key (gsi1pk) models the "belongs to" relation. Organizations appear not to belong to anything, so the index is blank. The index sort key gsi1sk takes the form <entity-type>#<entity-id>#<record-type>. The record-type suffix isn't needed except for scripts, but seems like a good idea to apply everywhere for consistency and future-proofing.

    "Get all users belonging to a organization" is answered with a query of gsi1pk = org#org-1 and gsi1sk > "user#". The same pattern satisfies the other "belongs to" queries.

    A note on your closing question "Am I better off with multiple dynamodb tables...?": Including multiple tables in your design doesn't give you more modelling options. You are still working with partition and sort keys. Multiple tables may be called for in certain cases, but it doesn't really help with modelling as such.


    1. The timestamp component also gives you the option to query for a subset of revisions by time period (e.g. current year's revisions).

    2. You could use ~info or z instead of xInfo, as long as the prefix character comes after rev in UTF-8 sort order. We do this to engineer a useful script sort key (descending) order: info record, latest revision, latest-1 revision, etc.