Search code examples

How to parse JSON records in Structured Streaming?

I'm working on a spark structured streaming app and I'm trying to parse JSON given in below format.


Below is my Spark code to parse the JSON:

import org.apache.spark.sql.types._
import spark.implicits._
 val schema= new StructType()
    .add("name",DataTypes.StringType )
    .add("age", DataTypes.IntegerType)
      new StructType()
        .add("city", DataTypes.StringType)
        .add("country", DataTypes.StringType)

  val dfLogLines = dfRawData.selectExpr("CAST(value AS STRING)") //Converting binary to text

  val personNestedDf =$"value", schema).as("person"))
  val personFlattenedDf = personNestedDf.selectExpr("", "person.age")



|-- name: string (nullable = true)
|-- age: integer (nullable = true)

Batch: 0
|name| age|

The code does not throw any error but it returns null values in output. What am I doing wrong here? Thanks in advance.


  • tl;dr The JSON looks not well-formed in the details field.

    From the documentation of from_json standard function:

    Returns null, in the case of an unparseable string.

    The issue is with the details field.


    It looks like an array or a map, but none matches.

    scala> Seq(Array("one", "two")).toDF("value") = false)
    |value                  |
    scala> Seq(Map("one" -> "two")).toDF("value") = false)
    |value                  |
    scala> Seq(("mumbai", "India")).toDF("city", "country").select(struct("city", "country") as "details") = false)
    |value                                          |

    My recommendation would be to do the JSON parsing yourself using a user-defined function (UDF).