Search code examples
pythonapache-sparkpysparkspark-structured-streaming

How to see a particular metric in Spark Structured Streaming with Python


I'm very new to Spark and Python. I'm trying to see any metric in Spark Structured Streaming (for example, processedRowsPerSecond), but I don't know how to do it.

I've read in "Structured Streaming Programming Guide" that with print(query.lastProgress) you can directly get the current status and metrics of an active query, but if I write it I only obtain None once. The last part of my code is the following:

query = windowedCountsDF\
    .writeStream\
    .outputMode('update')\
    .option("truncate", "false") \
    .format('console') \
    .queryName("numbers") \
    .start()

print(query.lastProgress)

query.awaitTermination()

Any idea on how to do it will be highly appreciated.


Solution

  • Try with:

    while query.isActive:
        print("\n")
        print(query.status)
        print(query.lastProgress)
        time.sleep(30)
    

    query.awaitTermination() blocks query.lastProgress.