I have a Cosmos DB collection with 4 million documents (~5GB). The following query reports a charge of 2.79 RUs:
SELECT * FROM c WHERE c.type='type1' and STRINGEQUALS(c.name,'abc',false)
But the same query with case-insensitive search (by replacing false
by true
) costs 1228 RUs.
Is there an explanation for why the case-insensitive query is more than 470 times more expensive than the case-sensitive query? I'm surprised by this because the documentation states
The RU charge for StartsWith and StringEquals is slightly higher with the case-insensitive option than without it.
Details:
type
.type1
contains 2 million documents.name
property has a different value for almost all 2 million documents."path": "/*"
)This is what I found out so far:
Unfortunately, the RU charge for case-insensitive STRINGEQUALS
seems to be linear in the cardinality of the property (i.e. the number of different values for that property). Which is really, really bad if you have lots of documents. The query above takes almost 1 s at a throughput of 10,000 RU/s. In contrast, case-sensitive string comparison is independent of the size of the collection. See also this discussion.
For small collections (< 10,000 docs) case-sensitivity doesn't make that much of a difference. (And also of course if the inclusion of the partition key restricted the size of the potential result set to a much smaller number.)
For larger collections you could store a duplicate of each property that should support efficient case-insensitive search in lower case and do a case-sensitive search on the lower-case property instead.
You can vote for the Feature Request to support efficient case-insensitive queries here.