Search code examples
azure-cosmosdbhierarchical-data

Need help flattening a multi-array item in Cosmos


I'm still somewhat inexperienced with Cosmos. I have a multi-array item that I need to flatten. What we have are multiple Project items with a variable number of Resources. Each Resource also has a variable number of Source Systems they can be related to.

What I need to be able to do is filter the data to a specific Source System, display all the Resources related to that Source System and then some Project details. So for example, the output would look like this...

personName, sourceSystem, name (of the Project)

Jane Doe, Z, Test Project

John Smith, Z, Test Project

In addition to filtering on sourceSystem "Z", I would also want to filter on the startDate of the Project within the last year.

Here is a sample of what a single Project item looks:

{
"resources": [
    {
        "personName": "Jane Doe",
        "displayInfo": null,
        "primaryKeys": [
            {
                "sourceSystemId": "99",
                "sourceSystem": "Z",
                "name": "Id",
                "type": "ProjectResource",
                "isActive": true,
                "isGold": true
            }
        ],
        "primaryUrl": null
    },
    {
        "personName": "John Smith",
        "displayInfo": null,
        "primaryKeys": [
            {
                "sourceSystemId": "99",
                "sourceSystem": "Z",
                "name": "Id",
                "type": "ProjectResource",
                "isActive": true,
                "isGold": true
            },
            {
                "sourceSystemId": "88",
                "sourceSystem": "Y",
                "name": "Id",
                "type": "ProjectResource",
                "isActive": true,
                "isGold": true
            }
        ],
        "primaryUrl": null
    }
],
"name": "Test Project",
"startDate": "2016-01-01T00:00:00",
"endDate": "2019-12-31T00:00:00",
"primaryKeys": [
    {
        "sourceSystemId": "9999",
        "sourceSystem": "Something",
        "name": "Id",
        "type": "Project",
        "isActive": true,
        "isGold": true
    }
],
"primaryUrl": null,
"isActive": true,
"isDeleted": false

}


Solution

  • CosmosDB offers a JOIN feature which allows for joining to occur in a single item. For your scenario where you want to flatten items that have 'resources' as an array and then 'primaryKeys' as an array within 'resources' that has 'sourceSystem', you could use JOIN keyword to perform self-join within an item first on resources and then on primaryKeys.

    Going by your example you are looking for 3 fields

    personName, sourceSystem, name (of the Project)

    So the cosmosDB query which will give these 3 fields information in a flattened out form will be

    SELECT project.name, resource.personName, key.sourceSystem
    FROM project
    JOIN resource IN project.resources
    JOIN key IN resource.primaryKeys
    

    The structure of the flattened out item will be-

    {
        "name": "Test Project",
        "personName": "Jane Doe",
        "sourceSystem": "Z"
    }
    

    Next if we have to filter on something then we have to use the WHERE keyword followed by the condition to filter upon

    So considering your example where you talk about being able to filter upon sourceSystem as Z, the query will be-

    SELECT project.name, resource.personName, key.sourceSystem
    FROM project
    JOIN resource IN project.resources
    JOIN key IN resource.primaryKeys
    WHERE key.sourceSystem = 'Z'
    

    In addition if you would want to filter on StartDate of a project to get something within last year, you could use one of the date and time functions DateTimeAdd to achieve that. The query would look something like this-

    SELECT project.name, resource.personName, key.sourceSystem
    FROM project
    JOIN resource IN project.resources
    JOIN key IN resource.primaryKeys
    WHERE key.sourceSystem = 'Z' AND project.startDate > DateTimeAdd('yyyy', -1, CURRENT_TIMESTAMP)