Search code examples
pythonjsonpysparkazure-data-factoryazure-synapse

ADF: Selecting from a json object that has attributes and values pivoted


One of the oracle REST APIs response is like below:

               `{
                    "attributeId": 300000000227671,
                    "attributeName": "BUSINESS_UNIT",
                    "attributeType": "Number",
                    "attributeValue": "300000207138371",
                    "timeBuildingBlockId": 300000300319699,
                    "timeBuildingBlockVersion": 1
                },
                {
                    "attributeId": 300000000227689,
                    "attributeName": "LDG_ID",
                    "attributeType": "Number",
                    "attributeValue": "300000001228038",
                    "timeBuildingBlockId": 300000300319699,
                    "timeBuildingBlockVersion": 1
                }`

Is it possible to 'select' the columns and values we want from those items? I would like to extract the attributevalue where attributeName is 'LOG_ID'. Unfortunately, I am tied to either Azure Data Factory or pyspark code.

Ideally, would like to have it pivoted like below:

LOG_ID BUSINESS_UNIT
300000001228038 300000207138371

Solution

  • Using PySpark you can achieve your results.

    First, create a DataFrame using the JSON response.

    df = spark.read.json(sc.parallelize([json_data]))
    

    Create the temporary view and get results via Spark SQL.

    
    df.createOrReplaceTempView("myTable")
    
    result = spark.sql("""
        SELECT
            MAX(CASE WHEN attributeName = 'LOG_ID' THEN attributeValue END) AS LOG_ID,
            MAX(CASE WHEN attributeName = 'BUSINESS_UNIT' THEN attributeValue END) AS BUSINESS_UNIT
        FROM myTable
    """)
    
    result.show()
    
    

    Output:

    LOG_ID BUSINESS_UNIT
    300000001228038 300000207138371