Search code examples
azureazure-cosmosdbazure-cosmosdb-sqlapi

CosmosDb search on the Index vs partition Key


By default in cosmosDb, all properties in documents are indexed, so why should I care to do researches on the partition key while the searches on index works perfectly as well and cost nothing ?

I have a cosmosDb with one million of document like this with each of them contain an array, the partition key is "tankId" e.g.:

{
    "id": "67acdb16-80dd-4a6c-a5b0-118d5f5fdb97",
    "tankId": "67acdb16-80dd-4a6c-a5b0-118d5f5fdb97"
    "UserIds": [
        "905336a5-bf96-444f-bb11-3eedb65c3760",
        "432270f5-780f-401b-9772-72ec96166be1",
        "cfecdf7e-5067-46b1-ab4e-25ca7d597248"
    ],
}

If I do a request on "UserIds" on this million documents which is not a partition key but indexed property, it takes only 3.32 RU !!! Wow.

SELECT *
FROM c 
WHERE ARRAY_CONTAINS(c.UserIds, "905336a5-bf96-444f-bb11-3eedb65c3760")

Is it a good practice to do that kind of request ? I am a little bit worried on my design.


Solution

  • It starts mattering once your number of physical partitions starts growing. Using the partition key will allow Cosmos to map the query to a logical partition that resides in a physical partition. Therefore the query won't be a so called 'cross-partition query' and it won't have to check the index of other physical partitions (that also would consume RU).

    In your case you are talking about a million documents which likely use a lot less than 50GB of data (the max size of a physical partition) so it's all stored in the same physical partition. Therefore you won't have any noticeable effects on the RU usage.

    So to answer your underlying question whether you should make any changes. Is your database mostly read heavy? Do you have any property that is often used for querying? Are you assured that your partitions remain under the logical partition size limit (20GB)? If yes, then you should likely consider it in your design. Even then it'll only matter once your data starts to split in physical partitions.