Search code examples
jsonscalaapache-sparkdataframe

how to convert json string to dataframe on spark


I want to convert string variable below to dataframe on spark.

val jsonStr = "{ "metadata": { "key": 84896, "value": 54 }}"

I know how to create dataframe from json file.

sqlContext.read.json("file.json")

but I don't know how to create dataframe from string variable.

How can I convert json String variable to dataframe.


Solution

  • For Spark 2.2+:

    import spark.implicits._
    val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
    val df = spark.read.json(Seq(jsonStr).toDS)
    

    For Spark 2.1.x:

    val events = sc.parallelize("""{"action":"create","timestamp":"2016-01-07T00:01:17Z"}""" :: Nil)    
    val df = sqlContext.read.json(events)
    

    Hint: this is using sqlContext.read.json(jsonRDD: RDD[Stirng]) overload. There is also sqlContext.read.json(path: String) where it reads a Json file directly.

    For older versions:

    val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
    val rdd = sc.parallelize(Seq(jsonStr))
    val df = sqlContext.read.json(rdd)