Search code examples
apache-sparkpysparkvisualizationspark-structured-streaming

Spark Structured Streaming Visualization


I'm trying to visualize streaming queries in structured streaming. How could I do that? Should I use dashboards or is there any other tool?

I cannot find anything similar on the Web.

DF = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", bootstrapServers)\
    .option("subscribe", topics)\
    .load()\
    .selectExpr("CAST(value AS STRING)")

...
 query1 = prediction.writeStream.outputMode("update").format('console').start()
 query1.awaitTermination()

Solution

  • Try something like this - queryName the clue:

    Scala

    // Have all the aggregates in an in-memory table
    val aggDF
     .writeStream
     .queryName("aggregates")    // this query name will be the table name
     .outputMode("complete")
     .format("memory")
     .start()
    
    spark.sql("select * from aggregates").show() 
    

    pyspark

    # Have all the aggregates in an in-memory table. The query name will be the table name
    aggDF \
      .writeStream \
      .queryName("aggregates") \
      .outputMode("complete") \
      .format("memory") \
      .start()
    
    spark.sql("select * from aggregates").show()   # interactively query in-memory table
    

    Notebooks from DataBricks have a display function.