Search code examples
apache-sparkspark-structured-streaming

Spark structured streaming: what are the possible usages of queryName() setting?


As per Structured Streaming Programming Guide

queryName("myTableName") is used to defined the in-memory table name when the output sink is format("memory")

aggDF
  .writeStream
  .queryName("aggregates") // this query name will be the table name
  .outputMode("complete")
  .format("memory")
  .start()

spark.sql("select * from aggregates").show() // interactively query in-memory table

Spark source code for DataStreamWriterscala documents queryName() as:

Specifies the name of the [[StreamingQuery]] that can be started with start(). This name must be unique among all the currently active queries in the associated SQLContext.

QUESTION: is there any other possible usages of the queryName() setting? Spark job logs? details in progress monitoring of the query ?


Solution

  • I came across the following three usages of the queryName:

    1. As mentioned by OP and documented in the Structured Streaming Guide it is used to define the in-memory table name when the output sink is of format "memory".

    2. The queryName defines the value of event.progress.name where the event is a QueryProgressEvent within the StreamingQueryListener.

    3. It is also used in the description column of the Spark Web UI (see screenshot where I set queryName("StackoverflowTest"):

    enter image description here