python json pyspark azure-data-factory azure-synapse

ADF: Selecting from a json object that has attributes and values pivoted

One of the oracle REST APIs response is like below:

               `{
                    "attributeId": 300000000227671,
                    "attributeName": "BUSINESS_UNIT",
                    "attributeType": "Number",
                    "attributeValue": "300000207138371",
                    "timeBuildingBlockId": 300000300319699,
                    "timeBuildingBlockVersion": 1
                },
                {
                    "attributeId": 300000000227689,
                    "attributeName": "LDG_ID",
                    "attributeType": "Number",
                    "attributeValue": "300000001228038",
                    "timeBuildingBlockId": 300000300319699,
                    "timeBuildingBlockVersion": 1
                }`

Is it possible to 'select' the columns and values we want from those items? I would like to extract the attributevalue where attributeName is 'LOG_ID'. Unfortunately, I am tied to either Azure Data Factory or pyspark code.

Ideally, would like to have it pivoted like below:

LOG_ID	BUSINESS_UNIT
300000001228038	300000207138371

Solution

Using PySpark you can achieve your results.

First, create a DataFrame using the JSON response.

df = spark.read.json(sc.parallelize([json_data]))

Create the temporary view and get results via Spark SQL.


df.createOrReplaceTempView("myTable")

result = spark.sql("""
    SELECT
        MAX(CASE WHEN attributeName = 'LOG_ID' THEN attributeValue END) AS LOG_ID,
        MAX(CASE WHEN attributeName = 'BUSINESS_UNIT' THEN attributeValue END) AS BUSINESS_UNIT
    FROM myTable
""")

result.show()

Output:

LOG_ID	BUSINESS_UNIT
300000001228038	300000207138371