Search code examples
azureazure-cosmosdbazure-cosmosdb-sqlapi

Some questions about Cosmos DB Physical and Logical Partitions


I am trying to understand the relationship between Physical/Logical partitions and throughput availability in Azure Cosmos DB and have a few questions.

Reference documentation: https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview.

Based on the documentation, here's my understanding:

  1. Each physical partition can hold 50GB data whereas each logical one can hold 20GB.
  2. Total provisioned throughput is evenly distributed amongst all physical partitions.
  3. Each physical partition can have a maximum of 10000 RU/s.
  4. Cosmos DB engine automatically creates physical partitions as and when it is needed and moves the logical partitions accordingly.

Now my questions are:

  • What is the logic behind creation of additional physical partitions?

Is it based on the space occupied by logical partitions or based on the throughput consumed by all logical partitions in a physical partition or something else completely. For example,

  1. Will Cosmos DB engine automatically creates 2 physical partitions if I provision a throughput of 20000 RU/s (regardless of whether I use it or not)?
  2. Will Cosmos DB engine create a single physical partition to begin with (I have just created a container with no data inside it and the provisioned throughput is less than 10000 RU/s)?
  3. Will Cosmos DB engine automatically remove physical partitions in case the total provisioned throughput becomes less than 10000 RU/s and/or the total size of logical partitions fall below 50 GB.

Any insights into this will be highly appreciated.

UPDATE

Based on the comments, I have split the original question in 2 parts. 2nd part of the question can be found here: How is the throughput available for a physical partition split amongst its logical partition in Cosmos DB?.


Solution

  • Some answers.

    1. Cosmos will actually create 3 partitions if you provision a new container with 20K RU/s. However if you start with less, say 5K RU, then scale up it will create 1 partitions, then increase to 2 partitions. The reason for the difference is we try to reduce the initial number of partition splits as users tend to ingest data during initial provisioning, often accompanying an additional increase in throughput. To reduce the number of partition splits we provision a physical partition at approx 60% of 10K RU/s. However, we don't apply this 60% universally because it's wasteful. It's just an optimization we make during initial provisioning based upon observed user patterns. It's also one of many reasons why you should not care about physical partitions and instead focus on your logical partition key. The 60% here is an implementation detail and can change at any time.

    2. Yes.

    3. Not yet but is coming. No ETA. (Update: this is now in preview, can learn more here, Merge Preview

    Throughput is always equally distributed so yes, 18K spread across 3 partitions, each would get 6K RU/s.