Search code examples
amazon-dynamodb

DynamoDB Composite key vs. Secondary Index


I'm looking into DynamoDB and I think I understand the core concepts for composite keys (partition-key + sort-key) and secondary indexes.

Now, I'm curious about design best-practices for a rather simple case.

Let's say I have 50 million books with a unique book-id and an author-id (plus additional fields):

{
  "book_id": "ghjk-5678-kj78-98kl",
  "author_id": 1234567,
  "release_date": "1970-01-01",
  "genre": "Fiction"
}

I only ever have these two use-cases:

  • Find book by book_id (20 million times per day)
  • Find all books by author_id (once per week)

I do not want to find books by any other attribute.

What would be the "better" approach (and why)?

  1. Use a composite key (author_id + book_id)
  2. Use a single-attribute key (book_id) in combination with a secondary index on author_id
  3. Stop overthinking it - that's premature optimization

Solution

    • Use a composite key (author_id + book_id)

    This depends, id you know the authorId of the book when you find book by bookId then yes, that would make perfect sense.

    • Use a single-attribute key (book_id) in combination with a secondary index on author_id

    If you don't know the authorId when you search a book, then this is a good option.

    • Stop overthinking it - that's premature optimization

    This is a significant optimization, now is the time to decide.

    PK SK Data
    Author1 Book1 Some info
    Author1 Book2 Some info
    Author1 Book3 Some info
    Author4 Book1 Some info
    Author2 Book1 Some info
    Author2 Book2 Some info

    With this model not only can you get a book by bookId (as long as you know the authorId, you can also get all books related to an author.

    You may not need an index in that case. But if you do not know the authorId for the most frequent request, your base table should be a bookId as PK, and your index will be like the above.