Search code examples
azure-data-factory

JSON file in ADF Data Flow is generating with Column Name and "\" symbol


I have a below data in csv file.

ID userId Name
719A070E-4874-E811-9CCE-02152146006A 123 Joe
5d7d0a74-f1b9-483a-8e9f-c7f45ccfeda0 456 Mike

Now, I need to generate below json using Azure Data Factory Data Flow,

{
    {
        "719A070E-4874-E811-9CCE-02152146006A":{
                 "userId":"123",
                  "Name":"Joe"
             }    
    },
    {
        "5d7d0a74-f1b9-483a-8e9f-c7f45ccfeda0":{
                 "userId":"456",
                  "Name":"Mike"
             }    
    }
 }

I have added Data Flow for this and made few transformation. After this transformations, I can able to bring the JSON data into a single column as JSON string. Now I need to generate a file with only that json string data in the derived column 'jsondata'. But that Column name is added as attribute while generating json in sink (with json dataset). How to avoid this?

Actual Result:

{"jsondata": {{\"719A070E-4874-E811-9CCE-02152146006A\":{
                 \"userId\":\"123\",
                  \"Name\":\"Joe\"
             }    
    },
    {
        \"5d7d0a74-f1b9-483a-8e9f-c7f45ccfeda0\":{
                 \"userId\":\"456\",
                  \"Name\":\"Mike\"
             }    
    }
 }}

Expected Result:

{
    {
        "719A070E-4874-E811-9CCE-02152146006A":{
                 "userId":"123",
                  "Name":"Joe"
             }    
    },
    {
        "5d7d0a74-f1b9-483a-8e9f-c7f45ccfeda0":{
                 "userId":"456",
                  "Name":"Mike"
             }    
    }
 }

Column name 'myjson' and escape character \ should not be added into my json output file. (I tried replace to remove \ but it didn't worked out)

Sink Settings: enter image description here enter image description here

Below is the Data Preview of SINK. I need to generate json with data in column (jsondata) only. I don't need the header (jsondata) to add as attribute in json file. enter image description here


Solution

  • Since the column is string type, the output data in not as expected. Since you have the jsondata column value as shown below:

    {
            "719A070E-4874-E811-9CCE-02152146006A":{
                     "userId":"123",
                      "Name":"Joe"
                 }    
        },
        {
            "5d7d0a74-f1b9-483a-8e9f-c7f45ccfeda0":{
                     "userId":"456",
                      "Name":"Mike"
                 }    
        }
    

    enter image description here

    • Ignore the transformations I made in the image. Now concat [ and ] to this value to make it a set of objects.

    enter image description here

    • Instead of creating a JSON dataset, create a DelimitedText dataset with output file name as required with .json extension (this would render the output file as JSON even though the dataset in delimited text).

    enter image description here

    • Use the sink dataset configurations as shown in the below image (quote character and escape character):

    enter image description here

    • This would generate a file which would be rendered as JSON with the following data:
    [
       {
          "719A070E-4874-E811-9CCE-02152146006A":{
             "userId":"123",
             "Name":"Joe"
          }
       },
       {
          "5d7d0a74-f1b9-483a-8e9f-c7f45ccfeda0":{
             "userId":"456",
             "Name":"Mike"
          }
       }
    ]
    

    enter image description here