One of the oracle REST APIs response is like below:
`{
"attributeId": 300000000227671,
"attributeName": "BUSINESS_UNIT",
"attributeType": "Number",
"attributeValue": "300000207138371",
"timeBuildingBlockId": 300000300319699,
"timeBuildingBlockVersion": 1
},
{
"attributeId": 300000000227689,
"attributeName": "LDG_ID",
"attributeType": "Number",
"attributeValue": "300000001228038",
"timeBuildingBlockId": 300000300319699,
"timeBuildingBlockVersion": 1
}`
Is it possible to 'select' the columns and values we want from those items? I would like to extract the attributevalue where attributeName is 'LOG_ID'. Unfortunately, I am tied to either Azure Data Factory or pyspark code.
Ideally, would like to have it pivoted like below:
LOG_ID | BUSINESS_UNIT |
---|---|
300000001228038 | 300000207138371 |
Using PySpark you can achieve your results.
First, create a DataFrame using the JSON response.
df = spark.read.json(sc.parallelize([json_data]))
Create the temporary view and get results via Spark SQL.
df.createOrReplaceTempView("myTable")
result = spark.sql("""
SELECT
MAX(CASE WHEN attributeName = 'LOG_ID' THEN attributeValue END) AS LOG_ID,
MAX(CASE WHEN attributeName = 'BUSINESS_UNIT' THEN attributeValue END) AS BUSINESS_UNIT
FROM myTable
""")
result.show()
Output:
LOG_ID | BUSINESS_UNIT |
---|---|
300000001228038 | 300000207138371 |