Search code examples
azure-cosmosdbazure-cosmosdb-emulator

High RU charge with case-insensitive search


I have a Cosmos DB collection with 4 million documents (~5GB). The following query reports a charge of 2.79 RUs:

SELECT * FROM c WHERE c.type='type1' and STRINGEQUALS(c.name,'abc',false)

But the same query with case-insensitive search (by replacing false by true) costs 1228 RUs.

Is there an explanation for why the case-insensitive query is more than 470 times more expensive than the case-sensitive query? I'm surprised by this because the documentation states

The RU charge for StartsWith and StringEquals is slightly higher with the case-insensitive option than without it.


Details:

  • Both queries return 0 results.
  • The partition key is type.
  • The logical partition type1 contains 2 million documents.
  • The name property has a different value for almost all 2 million documents.
  • The default indexing strategy is used ("path": "/*")

Solution

  • This is what I found out so far:

    Does Cosmos DB support efficient case-insensitive string comparison?

    Unfortunately, the RU charge for case-insensitive STRINGEQUALS seems to be linear in the cardinality of the property (i.e. the number of different values for that property). Which is really, really bad if you have lots of documents. The query above takes almost 1 s at a throughput of 10,000 RU/s. In contrast, case-sensitive string comparison is independent of the size of the collection. See also this discussion.

    What if I need efficient case-insensitive string comparisons?

    For small collections (< 10,000 docs) case-sensitivity doesn't make that much of a difference. (And also of course if the inclusion of the partition key restricted the size of the potential result set to a much smaller number.)

    For larger collections you could store a duplicate of each property that should support efficient case-insensitive search in lower case and do a case-sensitive search on the lower-case property instead.

    You can vote for the Feature Request to support efficient case-insensitive queries here.