Why is RU consumption higher than the ratio of provisioned throughput to autoscale max throughput?

Why is RU consumption higher than ratio of provisioned throughput to autoscale max throughput?

What I'm seeing:

Autoscale max throughput is 220k
Provisioned throughput is only 153k
But RU consumption is 100%!

How come RU consumption is 100% when the provisioned throughput is nowhere close to Autoscale max throughput? Is RU consumption based on provisioned throughput even in autoscale mode? If that's the case, then doesn't that mean that RU consumption can't be used to determine if the database/container's autoscale max throughput is set too low or too high?

How do I determine if I have over or under-provisioned when in autoscale mode?

Solution

Normalized RU consumption metric of 100% just means that at least one physical partition used its entire RU budget (220k / number of physical partitions) in at least one second.

Autoscale only raises the provisioned throughput when the average Normalized RU consumption over a sliding window of a few seconds would warrant that. Specifically the documentation states...

Azure Cosmos DB only scales the RU/s to the maximum throughput when the normalized RU consumption is 100% for a sustained, continuous period of time in a 5-second interval

As the billing for autoscale is based on the peak provisioned RU reached in any one hour using the Normalized RU consumption metric directly could have a fairly horrible multiplactive effect on the billing for that hour (a single busy partition for an atypical single second would then have an effect that is multiplied out by number of partitions in the collection * number of seconds in an hour * number of regions the account is georeplicated to)

Regarding whether you are over provisioned or under provisioned you do have some flat lines on your graph where you have reached the autoscale min - and you never reached the autoscale max in the time period covered (though did come quite close to it on May 8th) so on that basis you are maybe somewhat over provisioned - it really depends how much premium you are happy to pay to reduce the risk of seeing throttling.

I don't find Normalized RU consumption a very useful metric on its own because it does not help distinguish between a collection performing periodic expensive operations and one under sustained pressure. E.g. If you have a collection at 40,000 RU and 80 physical partitions the "per partition per second" budget is 500 RU. If the documents are large and you have wildcard indexing it is possible that a single insert can do that so a consistent trickle of inserts can make the collection appear permanently maxed out on that metric (you can split this metric by physical partition though and validate whether you are just getting specific partitions peaking whilst others are much more idle and whether or not it is consistently the same "hot" partition - as is also shown in the heatmap in the "classic" metrics).

Another way of looking at is that you have a max throughput of 220k per second. So in theory can sustain 13,200,000 RU per minute. You can look at the metrics for Request Units used and see what your peak minute was to see how close you are to this theoretical max. If you never get anywhere near this then (as long as your work is evenly distributed across partitions) you might conclude that there is scope to scale down - and just retry on any throttled requests as likely any peaks are very transient.