Search code examples
scalaapache-spark

How to convert a dictionary which is in string format to tabular dataframe in scala?


I have an method which return a string and the value is like dictionary. E.g type is string and the return value is:

{"firstName":"bb288e8ff56b","lastName":"ae4863bdae026314"}

I want to convert this to a dataframe which will have two column firstName and LastName.

For now i am only able to store it as a single column in dataframe using .toDF()

val df=Seq(retrunString).toDF("record");

Can some one help on this.


Solution

  • You can use the from_json function from Spark's functions package to parse the JSON string into a struct:

    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    import spark.implicits._
    
    val jsonString = """{"firstName":"bb288e8ff56b","lastName":"ae4863bdae026314"}"""
    
    val df = Seq(jsonString).toDF("record")
    
    val schema = StructType(
      Seq(
        StructField("firstName", StringType),
        StructField("lastName", StringType)
      )
    )
    
    val parsedDf = df
      .select(from_json(col("record"), schema).as("parsed"))
      .select("parsed.firstName", "parsed.lastName")
    
    parsedDf.show()
    
    +------------+----------------+
    |   firstName|        lastName|
    +------------+----------------+
    |bb288e8ff56b|ae4863bdae026314|
    +------------+----------------+