Search code examples
jsonapache-sparkspark-structured-streaming

Structured streaming load json convert to column output is null


JsonData is like {reId: "1",ratingFlowId: "1001",workFlowId:"1"} and I use program as follows:

case class CdrData(reId: String, ratingFlowId: String, workFlowId: String)

object StructuredHdfsJson {
  def main(args: Array[String]): Unit = {
     val spark = SparkSession
      .builder()
      .appName("StructuredHdfsJson")
      .master("local")
      .getOrCreate()

     val schema = Encoders.product[CdrData].schema
     val lines =  spark.readStream
       .format("json")
       .schema(schema)
       .load("hdfs://iotsparkmaster:9000/json")
     val query = lines.writeStream
       .outputMode("update")
       .format("console")
       .start()

     query.awaitTermination()
   }
}

But the outputs is null, as follows:

------------------------------------------- 
Batch: 0 
------------------------------------------- 

+----+------------+----------+
|reId|ratingFlowId|workFlowId|
+----+------------+----------+
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
|null|        null|      null|
+----+------------+----------+

Solution

  • Probably Spark can't parse your JSON. The issue can be related to spaces (or any other characters inside JSON. You should try to clean your data and run the application again.

    Edit after comment (for future readers): keys should be put in quotation marks

    Edit 2: according to json specification keys are represented by strings, and every string should be enclosed by quotation marks. Spark uses Jackson parser to convert strings to object