database-design amazon-dynamodb dynamodb-queries amazon-dynamodb-index

DynamoDB single table design

I'm building an app using DynamoDB as the database. My app has a concept of users, organizations, projects and scripts.

An Organization has many projects
A project or organization has many scripts
A user can be part of a single organization so an organization can have many users.
A script can either belong to a project or an organization. Each script has revisions which needs to be store so that users can rollback to previous version.

Use cases the app needs to query:

Get all users belonging to a organization
Get all projects belonging to a organization
Get all scripts belonging to a project
Get script by id
Get all versions of the script by script id
Get latest version of the script by script id
Get organization by org id
Get project by project id
Get script by script id

I was thinking of creating a single table Entities:

pk	sk	type (gsi1-pk)	parent (gsi1-sk)
root#org-1	info	org	root
org-1#proj-1	info	project	org-1
proj-1#script-1	info	script	proj-1
proj-1#script-1	rev_latest	script_revision	script-1
proj-1#script-1	rev_1	script_revision	script-1
proj-1#script-1	rev_2	script_revision	script-1
org-1#user-1	info	user	org-1
org-1#user-2	info	user	org-1

PK and SK would allow me to get the data of specific items providing the ID. For the queries where I have the id of a parent object and need to fetch the children, I would create a GSI and query against that.

For PK, I could drop the first token before # which is supposed to represent the parent object which is redundant since it's defined in the parent column but I left it there to prevent accidental cross-object updates.

Am I better off with multiple DynamoDB tables to store the info?

I would prefer not using scan queries for economic reasons. Ideally I can get all the data via query and scan operation.

Solution

Here's one single-table option. The design satisfies your access patterns using query operations. No scan operations are required.

pk	sk	gsi1pk	gsi1sk
org#org-1	xInfo
proj#proj-1	xInfo	org#org-1	proj#proj-1#xInfo
script#script-1	xInfo	proj#proj-1	script#script-1#xInfo
script#script-1	rev#2023-10-05T22:00:00	proj#proj-1	script#script-1#rev#2023-10-05T22:00:00
script#script-1	rev#2023-03-13T10:00:00	proj#proj-1	script#script-1#rev#2023-03-13T10:00:00
script#script-1	rev#2022-12-20T08:00:00	proj#proj-1	script#script-1#rev#2022-12-20T08:00:00
user#user-1	xInfo	org#org-1	user#user-1#xInfo
user#user-2	xInfo	org#org-1	user#user-1#xInfo

The table primary key models the records. The partition key pk has the format <entity-type>#<entity-id>. I find the entity-type prefix helpful to visually identify a record's entity type, but it is optional here. The sort key sk defines the record type. Scripts have two record types, info and revisions. Giving script revisions a timestamp "id" makes it easy to retrieve all revisions or only the latest revision¹. I renamed info to xInfo to move the "info" record after all revisions in lexical ordering². Why do this? So we can query a script by pk in descending order (ScanIndexForward: false) with a limit of 2 to get a script's info record and latest revision in a single operation. No limit returns all revisions and the info record.

Use the table primary key to answer all the "by id" queries. We need an index to handle the "belongs to" queries.

The secondary index partition key (gsi1pk) models the "belongs to" relation. Organizations appear not to belong to anything, so the index is blank. The index sort key gsi1sk takes the form <entity-type>#<entity-id>#<record-type>. The record-type suffix isn't needed except for scripts, but seems like a good idea to apply everywhere for consistency and future-proofing.

"Get all users belonging to a organization" is answered with a query of gsi1pk = org#org-1 and gsi1sk > "user#". The same pattern satisfies the other "belongs to" queries.

A note on your closing question "Am I better off with multiple dynamodb tables...?": Including multiple tables in your design doesn't give you more modelling options. You are still working with partition and sort keys. Multiple tables may be called for in certain cases, but it doesn't really help with modelling as such.

The timestamp component also gives you the option to query for a subset of revisions by time period (e.g. current year's revisions).
You could use ~info or z instead of xInfo, as long as the prefix character comes after rev in UTF-8 sort order. We do this to engineer a useful script sort key (descending) order: info record, latest revision, latest-1 revision, etc.