I'm looking into DynamoDB and I think I understand the core concepts for composite keys (partition-key + sort-key) and secondary indexes.
Now, I'm curious about design best-practices for a rather simple case.
Let's say I have 50 million books with a unique book-id and an author-id (plus additional fields):
{
"book_id": "ghjk-5678-kj78-98kl",
"author_id": 1234567,
"release_date": "1970-01-01",
"genre": "Fiction"
}
I only ever have these two use-cases:
book_id
(20 million times per day)author_id
(once per week)I do not want to find books by any other attribute.
What would be the "better" approach (and why)?
author_id
+ book_id
)book_id
) in combination with a secondary index on author_id
author_id
+ book_id
)This depends, id you know the
authorId
of the book when you find book bybookId
then yes, that would make perfect sense.
book_id
) in combination with a secondary index on author_id
If you don't know the
authorId
when you search a book, then this is a good option.
This is a significant optimization, now is the time to decide.
PK | SK | Data |
---|---|---|
Author1 | Book1 | Some info |
Author1 | Book2 | Some info |
Author1 | Book3 | Some info |
Author4 | Book1 | Some info |
Author2 | Book1 | Some info |
Author2 | Book2 | Some info |
With this model not only can you get a book by bookId
(as long as you know the authorId
, you can also get all books related to an author.
You may not need an index in that case. But if you do not know the authorId
for the most frequent request, your base table should be a bookId
as PK, and your index will be like the above.