Search code examples
restazure-data-factory

Parsing an array from an API in azure data factory


I'm reading data from an API using azure data factory. The response has two arrays one called tags which contains an array of ids and names, and another called riskCauses which again holds an array of ids and names. I'm currently using a copy activity, but I want to split each of the arrays out into separate files in blob storage with rows for each data entry in of id and name together with the main id for the whole record so can relate it back later on when I build a table structure for in in our DW. Can anyone advise on how I might be able to do this? I've included an example of the API response below:

[
  {
    "totalCount": 1,
    "records": [
      {
        "id": 100106,
        "name": "Test",
        "status": 0,
        "description": "Test",
        "owner": {
          "id": 102758,
          "name": "Jules Gale"
        },
        "tags": [
          {
            "id": 10175,
            "name": "Test Tag 1"
          },
          {
            "id": 10160,
            "name": "Test Tag 2"
          },
          {
            "id": 10163,
            "name": "Test Tag 3"
          }
        ],
        "completed": true,
        "createDate": "2022-02-23 11:54:46.929",
        "createdBy": {
          "id": 2760,
          "name": "Test.Consulting"
        },
        "lastModifiedBy": {
          "id": 100760,
          "name": "Test User 1"
        },
        "lastModifiedDate": "2022-12-14 11:05:07.371",
        "riskCauses": [
          {
            "id": 100009,
            "name": "Test Cause 1",
            "status": 0
          },
          {
            "id": 100013,
            "name": "Test Cause 2",
            "status": 0
          },
          {
            "id": 100066,
            "name": "Test Cause 3",
            "status": 0
          },
          {
            "id": 100008,
            "name": "Test Cause 4",
            "status": 0
          }
        ]
      }
    ],
    "maxPage": 50
  }
]

Solution

  • As you said you want to parse the two nested arrays to fulfill this requirement, I would suggest you to use a dataflow.

    Follow below approach:

    • Create a pipeline and take a dataflow activity and create a new dataflow.
    • Take 2 sources with same Rest API and then go to projection and import projection so it will import the appropriate schema.enter image description here
    • Then take flatten transformation for both the Sources and flatten the required array with main id for the whole record.enter image description here

    Data preview for tags array with main Id:

    enter image description here

    • Take 2 sinks and store each flattened array value in sink file. enter image description here

    Output:

    enter image description here