Search code examples
azureazure-cosmosdb-mongoapi

CosmosDb Mongo - collection with shardkey, slow query by shardkey?


I have a CosmosDb collection with Mongodb. This is a customer database, and the ShardKey is actually CustomerId. My collection has 200000 records, and has an combined index of both e-mail and customerid.

An example of a customer:

{
"CustomerId" : "6a0f4360-d722-4926-9751-9c7fe6a97cb3",
"FirstName" : "This is my company first name",
"LastName" : "This is my company last name",
"Email" : "[email protected]",
"Addresses" : [
    {
        "AddressId" : "54e34da9-55fb-4d60-8411-107985c7382e",
        "Door" : "11111",
        "Floor" : "99",
        "Side" : "B",
        "ZipCode" : "8888",
    }
]

}

What I find strange is if I query by Email it spends 7000RUs (which is too much - at least is what data explorer tells me...) but if I query by CustomerId, it spends more or less the same RUs...

My questions are:

  • Shoudn't both operations spend less RUs than this, specially by CustomerId?

An example of a query by E-mail:

{ "Email" : { $eq: "[email protected]" } }

An example of a query by CustomerId:

{ "CustomerId" : { $eq: "3f7da6c3-81bd-4b1d-bfa9-d325388079ab" } }

Another question, my index contains both Email and CustomerId. Is there any way for me to query by e-mail and return only CustomerId, for example?


Solution

  • Shoudn't both operations spend less RUs than this, specially by CustomerId?

    CustomerId is your shard key (aka partition key) which helps in grouping documents with same value of CustomerId to be stored in the same logical partition. This grouping is used during pin-point GET/SET calls to Cosmos but not during querying. So, you would need an index on CustomerId explicitly.

    Furthermore, since the index that you have is a composite index on CustomerId and Email, doing a query on only one of these fields at a time will lead to a scan being performed in order to get back the result. Hence the high RU charge and the similar amount of RU charge on each of these queries.

    Another question, my index contains both Email and CustomerId. Is there any way for me to query by e-mail and return only CustomerId, for example?

    Firstly, in order to query optimally on Email, you would need to create an index on Email separately. Thereafter, you may use the project feature of Mongo to include only certain fields in the response.

    Something like this-

    find({ "Email" : { $eq: "[email protected]" } }, { "CustomerId":1 })