Search code examples
amazon-dynamodb

How does the DynamoDB partition key work?


I'm trying to understand how the partition created for DynamoDB tables.

According to this blog, "All items with the same partition key are stored together", so if I have a table with user id from 1 to 1000, does that mean I will have 1000 partition? Or it's up to the "internal hash function", but how do we know how many partitions there will be?

It later suggested using random suffix from 1-10 to evenly distribute data for each partition, but how does it know it will query 10 times for a given invoice number? Is that only when you have 10 partitions? but in this case you could have thousands of invoice numbers, that means the same amount of partitions will be created, and query made to query an invoice number


Solution

  • When an Amazon DynamoDB table is created, you can specify the desired throughput in Reads per second and Writes per second. The table will then be provisioned across multiple servers (partitions) sufficient to provide the requested throughput.

    You do not have visibility into the number of partitions created -- it is fully managed by DynamoDB. Additional partitions will be created as the quantity of data increases or when the provisioned throughput is increased.

    Let's say you have requested 1000 Reads per second and the data has been internally partitioned across 10 servers (10 partitions). Each partition will provide 100 Reads per second. If all Read requests are for the same partition key, the throughput will be limited to 100 Reads per second. If the requests are spread over a range of different values, the throughput can be the full 1000 Reads per second.

    If many queries are made for the same Partition Key, it can result in a Hot Partition that limits the total available throughput.

    Think of it like a bank with lines in front of teller windows. If everybody lines up at one teller, less customers can be served. It is more efficient to distribute customers across many different teller windows. A good partition key for distributing customers might be the customer number, since it is different for each customer. A poor partition key might their zip code because they all live in the same area nearby the bank.

    The simple rule is that you should choose a Partition Key that has a range of different values.

    See: Partitions and Data Distribution