Search code examples
azureazure-synapse-analyticsazure-synapse-pipeline

Azure Data Flow flatten and parse key/value column for complex JSON


I'm trying to transform a key/value data into column using Azure synapse Data Flow. Basically this:

"scoreIndeces": {
"66": 0.2,
"67": 0.3,
"68": 0.4,
"70": 0.5,
"71": 0.6,
"73": 0.7,
"77": 0.8,
"78": 0.9,
"80": 1,
"81": 1.1,
"82": 1.2,
"83": 1.3,
"84": 1.4,
"85": 1.5,
"86": 1.6,
"87": 1.7,
"88": 1.8,
"89": 1.9
}

into this:

scoreIndecesKey scoreIndecesValue
66 0.2
67 0.3

How can I resolve this problem?

I followed the steps provided in this question, but it does not match my use case.


Solution

  • You can use Unpivot transformation in the Dataflow to achieve your requirement.

    After getting the required scoreIndeces object as a column, extract the inner keys as columns using select transformation Rule-based mapping. In the select transformation select Add mapping-> Rules-based mapping and give the scoreIndeces object as the hierarchy level as shown below. This will extract all the inner keys as columns and rows.

    enter image description here

    To unpivot, it needs a grouped column. So, use Derived column transformation and create a new column key and give any string value. Also, convert all column values to strings using Derived column Column pattern option. Click on Add and select Column pattern and give below expressions.

    enter image description here

    Then, use the Unpivot transformation with below configurations.

    Ungroup by : key
    

    Unpivot key - Unpivot column name - scoreIndecesKey and type - string

    enter image description here

    Unpivoted columns - Column name - scoreIndecesValue and type - string

    enter image description here

    It will give the output like below.

    enter image description here

    Here, convert your columns to required data types using another Derived column.

    scoreIndecesKey - toInteger(scoreIndecesKey)
    scoreIndecesValue - toDouble(scoreIndecesValue)
    

    enter image description here

    Now, it will convert the columns to required data type. Remove the extra key column using another select transformation and it will give the desired output.

    enter image description here