Search code examples
scalaapache-spark

How to Parse and Read the json data in scala and form a string by iterating through the values?


I am trying to parse this json data and then read values under Orders element. My final goal is to read all the keys available under order element and then create a string by concatenating the values. E.g under Orders i have orderid,customerid and orderprice. i want for form an string with its values "order_id,customers.customerIdorder.price.charge,

{"Orders":{"orderid":"order_id","customerId":"customers.customerId","orderprice":"order.price.charge"},"Products":{"productid":"product_id","productName":"products.productName"}}



import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import spark.implicits._
import scala.util.parsing.json._
val jsonString="""{"Orders":{"orderid":"order_id","customerId":"customers.customerId","orderprice":"order.price.charge"},"Products":{"productid":"product_id","productName":"products.productName"}}""";
    val json_data = JSON.parseFull(jsonString);

After this i am not able to iteratively read the keys and create that concatenated string. Can someone help.


Solution

  • You can import ujson & upickle json libraries like below to spark

    spark-shell --packages "com.lihaoyi:ujson_2.12:3.1.4,com.lihaoyi:upickle_2.12:3.1.4"
    
    val jsonString="""
    {
        "Orders": {
            "orderid": "order_id",
            "customerId": "customers.customerId",
            "orderprice": "order.price.charge"
        },
        "Products": {
            "productid": "product_id",
            "productName": "products.productName"
        }
    }
    """
    

    Convert json string to json format using ujson.read method.

     val data = ujson.read(jsonString)
    

    Below is the code to extract its keys & contact values using ,.

    data
    .obj
    .keys
    .map{ key => 
       Map(key -> data.obj(key).obj.values.map(_.str).mkString(",")) 
    }
    

    Below is the output & use filter to filter specific key.

    Set(Map(Orders -> order_id,customers.customerId,order.price.charge), Map(Products -> product_id,products.productName))
    

    OR You can use spark to concat json values like below.

    scala> df.show(false)
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |json                                                                                                                                                                              |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |{"Orders":{"orderid":"order_id","customerId":"customers.customerId","orderprice":"order.price.charge"},"Products":{"productid":"product_id","productName":"products.productName"}}|
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    
    scala> val schema = "Orders map<string,string>"
    scala> val jsonExprs = s"""concat_ws(',',map_values(from_json(json, '${schema}').orders)) AS output"""
    scala> df.selectExpr(jsonExprs).show(false)
    +------------------------------------------------+
    |output                                          |
    +------------------------------------------------+
    |order_id,customers.customerId,order.price.charge|
    +------------------------------------------------+