Search code examples
azureindexingazure-cosmosdbazure-cosmosdb-sqlapi

Is cosmos index used when filtering on nested list through join?


Given that I have a NoSQL Cosmos DB container with items of the following shape:

[
    {
        "id": "SmithFamily",
        "memberCount": 3,
        "children": [{
            "name": "Foo",
            "pets": [{ "name": "Bar" }]
        }]
    },
    {
        "id": "DoeFamily",
        "memberCount": 2,
        "children": [{
            "name": "Baz",
            "pets": [{ "name": "Qux" }]
        }]
    }
]

and an indexing policy like:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/children/[]/pets/[]/name/?"
        },
        {
            "path": "/children/[]/name/?"
        }
    ],
    "excludedPaths": [
        {
            "path": "/*"
        }
    ]
}

I have found that I can find the family with a pet name "Qux" by querying like so:

SELECT f
FROM family f
JOIN child IN f.children
WHERE ARRAY_CONTAINS(child.pets, {name: "Qux"})

But will this query use the index I've intended for it to use, i.e. /children/[]/pets/[]/name/? or not? None of the examples or the descriptions in the index-policy docs speaks of joins. But from what I understand from the index policy with a nested list seems to be allowed, and I get no complaints when I apply the policy.

I'm think this would be important to know especially if I want to create a composite index, for example:

"compositeIndexes": [  
    [  
        {  
            "path": "/children/[]/pets/[]/name/?",
            "order":"ascending"
        },
        {  
            "path":"/memberCount",
            "order":"ascending"
        }
    ]
]

In order to be able to use that index, I believe I need to make sure to first filter by pet name and then order by member count so if I'd run:

SELECT f
FROM family f
JOIN child IN f.children
WHERE ARRAY_CONTAINS(child.pets, {name: "Qux"})
ORDER BY f.memberCount

And it doesn't use the first path in the composite index, from what I understand, the index won't be used at all and I'd loose performance (and it would use more RU).

This all seems more straight forward on the index /children/[]/name/? which, unless I've misunderstood something, should be used if I query like:

SELECT * FROM c WHERE ARRAY_CONTAINS(c.children, {name: "Baz"}, true)

Is there a way to verify that my index is used in a specific query?


Solution

  • My preference for this sort of question is just to try it.

    It is easy to bulk load hundreds of thousands of documents into a collection using the code here.

    I have done so for your Family model (250,000 documents) and applied your indexing policy.

    Running

    SELECT f
    FROM family f
    JOIN child IN f.children
    WHERE ARRAY_CONTAINS(child.pets, {name: "Qux"})
    

    Using the "New SQL Query" tab in the portal returns the following query stats (showing the index was used)

    Metric Value
    Request Charge 2.83 RUs
    Showing Results 1-1
    Retrieved document count 1
    Retrieved document size 298 bytes
    Output document count 1
    Output document size 353 bytes
    Index hit document count 1
    Index lookup time 0.1 ms
    Document load time 0.02 ms
    Query engine execution time 0.01 ms
    System function execution time 0 ms
    User defined function execution time 0 ms
    Document write time 0 ms