Suppose we start with an empty Cosmos DB container with 4000 RU/s provisioned throughput (i.e. there will only be one physical partition initially), using /id
as partition key (so each logical partition contains exactly one document). As the container grows and approaches or exceeds the maximum physical partition size of 50 GB, the container will eventually split into multiple physical partitions, and the throughput will be split up between these partitions accordingly.
Is it then necessary to provide n * 4000 RU/s
(where n
is the number of physical partitions) to maintain the performance of the collection when it was still a single physical partition?
More generally, how does the number of physical partitions (i.e., splitting the throughput) impact the performance on point reads, writes and (cross-partition) queries?
The RU/sec allocation will indeed be split across physical partitions. This is definitely something to be aware of, as partition-splits can result in an unexpected RU/sec drop (or throttling). Keep in mind though:
429
(throttle) frequency to determine when/if scaling is needed.But yes, if you allocated 4000/sec and then, due to capacity, the container was split into two physical partitions, the RU/sec would be divided across the two physical partitions, requiring you to (potentially) increase overall RU/sec to deal with this.
As for point-reads: given that you have to provide a partition-key value (in your case /id
), there is no performance impact: still 1 RU for a 1K doc. For cross-partition queries, your RU cost will likely increase, when the query has to go beyond the first partition searched.