Search code examples
jsonazureazure-data-factoryazure-logic-appscustom-data-type

Azure Data Factory JSON Output Formatting Issue


I'm encountering an issue with formatting the JSON output in Azure Data Factory. My desired JSON output structure is as follows:

{
  "AccessionNum": "12345",
  "test": [
    {
      "orderunit": "JMJ",
      "Testname": "TVK"
    },
    {
      "orderunit": "PJ",
      "Testname": "CVK"
    }
  ]
}

However, after performing all the necessary operations in Azure Data Factory from CSV input files, the resulting JSON output is as follows, where the "test" array elements are treated as strings rather than JSON objects:

{
  "AccessionNum": "12345",
  "test": [
    "{\"orderunit\":\"JMJ\",\"Testname\":\"TVK\"}",
    "{\"orderunit\":\"PJ\",\"Testname\":\"CVK\"}"
  ]
}

I've attempted various expressions and transformations within Azure Data Factory, but I haven't been able to achieve the desired JSON structure where the "test" array elements are proper JSON objects. Could someone please assist me in resolving this issue and achieving the correct JSON output format within Azure Data Factory?

Thank you in advance for your help!

In attempting to resolve the formatting issue with the JSON output in Azure Data Factory, I tried several approaches within the data flow pipeline:

Derived Column Transformation: Initially, I attempted to use the "Derived Column" transformation in Azure Data Factory to create the "test" array with proper JSON objects. I used expressions such as concat('{"orderunit":"',orderunit,'","Testname":"',testname,'"}') to generate the array. However, this resulted in the "test" array elements being treated as strings rather than JSON objects.

Mapping Functions: I explored other available functions within Azure Data Factory's expression language, such as createMap, struct, and array, to construct the JSON objects. Unfortunately, none of these methods are avialble in Azure Data factory.

External Scripting: As a workaround, I considered using external scripting or custom activities within Azure Data Factory to post-process the data and transform it into the correct JSON format. However, I was hoping to find a solution that could be entirely implemented within the Azure Data Factory pipeline.

In terms of expectations, I was aiming to achieve a JSON output format where the "test" array contains proper JSON objects, as illustrated in my initial post:

{
  "AccessionNum": "12345",
  "test": [
    {
      "orderunit": "JMJ",
      "Testname": "TVK"
    },
    {
      "orderunit": "PJ",
      "Testname": "CVK"
    }
  ]
}


Solution

  • Azure Data Factory JSON Output Formatting Issue

    In your case you are using concat function which is string function because of it you are getting string value.

    To achieve your requirement from CSV to Json you need to use Below expressions

    --In aggreagatea transformation
    collect(@(orderunit=orderunit,      Testname=Testname))
    
    --In derived column transformmation
    array(@(orderunit=orderunit,        Testname=Testname))
    

    My sample input dataset:

    enter image description here

    Aggregate transformation:

    enter image description here

    enter image description here

    Output:

    enter image description here