Search code examples
azure-data-factoryazure-cosmosdb

How to convert SQL table row into the first element of an array of objects in CosmosDB NoSQL using Azure Data Factory


I have a table with 3 columns: ID, A, and B, and I need to transform the data into a CosmosDB document with a specific structure. The desired structure is as follows:

{
  "id": ID (value from the ID column),
  "values": [{"key": A (value from the A column), "value": B (value from the B column)}]
}

To achieve this, I utilized the Copy Activity in Azure Data Factory.

I attempted to accomplish this using path expressions like ['values']['key'] and ['values']['value'] in mappings, but each time I ended up with an object instead of an array containing objects and this syntax is not supported ['values'][0]['key'].

Error message: JSON path "$['predictedIncidenceRates'][0]['key']" is not correct. Array accessor like [0] is not supported in the schema mapping sink.

I tried using the Mapping Data Flow expression builder to create a derived column with this expression:

[
  {
    "id": key,
    "key": name,
    "value": sortOrder
  }
]

However, this resulted in an error.

Can anyone provide guidance on how to structure the CosmosDB document correctly with the given requirements? Additionally, if there's a better approach or any alternative solutions, I would appreciate any insights. Thank you!


Solution

  • Error message: JSON path "$['predictedIncidenceRates'][0]['key']" is not correct. Array accessor like [0] is not supported in the schema mapping sink.

    The above error is solved when we remove the array accessor [0], in the mapping activity as you can see below. enter image description here

    It got succeed when [0] is removed in the below output: enter image description here

    Sample data i stored in Azure SQL Database:

    Id  A   B
    1   a1  b1
    2   a2  b2
    3   a3  b3
    4   a4  b4
    5   a5  b5
    6   a6  b6
    7   a7  b7
    8   a8  b8
    9   a9  b9
    10  a10 b10
    

    Output in Azure Cosmos DB:

    As you can see in the below output the data which is in tabular format in Azure SQL DB has converted into hierarchical pattern

    {
        "id": "1",
        "values": {
            "key": "a1",
            "value": "b1"
        }
    },
    {
        "id": "2",
        "values": {
            "key": "a2",
            "value": "b2"
        }
    },
    .
    .
    .
    .,
    {
        "id": "10",
        "values": {
            "key": "a10",
            "value": "b10"
        }
    }
    

    To write array of objects in cosmos db, you can use dataflow in ADF. Below is the approach:

    • Take source as Azure SQL and do aggregation by doing Group By with Id and aggregate with Values.

    • Use collect(@(Key=A, Value=B)) in Expression to store an array of object as shown in the below:

    enter image description here

    • Sink with Azure Cosmos DB by allowing it to Insert into the cosmos container then it stores in an array format as you can see in the below:

    enter image description here

    Output in Azure Cosmos DB:

    As you can see in the below output the data which is in tabular format in Azure SQL DB has converted into hierarchical pattern by storing objects in an array format.

    {
        "Id": 1,
        "Values": [
            {
                "Key": "a1",
                "Value": "b1"
            }
        ]
    },
    {
        "Id": 2,
        "Values": [
            {
                "Key": "a2",
                "Value": "b2"
            }
        ]
    },
    .
    .
    .
    .
    .,
    {
        "Id": 10,
        "Values": [
            {
                "Key": "a10",
                "Value": "b10"
            }
        ]
    }