Search code examples
apache-sparkpysparkspark-structured-streaming

Spark: Use Persistent Table as Streaming Source for Spark Structured Streaming


I stored data in a table:

spark.table("default.student").show()

(1) Spark Jobs
+---+----+---+
| id|name|age|
+---+----+---+
|  1| bob| 34|
+---+----+---+

I would like to make a read stream using that table as source. I tried

newDF=spark.read.table("default.student")
newDF.isStreaming

Which returns False.

Is there a way to use a table as Streaming Source?


Solution

  • Need to use delta table. Like this on Databricks Notebook:

    data = spark.range(0, 5)
    data.write.format("delta").mode("overwrite").saveAsTable("T1")
    stream = spark.readStream.format("delta").table("T1").writeStream.format("console").start()
    
    // In another cell, execute:
    data = spark.range(6, 10)
    

    In DriverLogs can see 2 sets of data, then.