Search code examples
apache-sparkpysparkspark-structured-streaming

Retrieve graphical information using Spark Structured Streaming


Spark Streaming provided a "Streaming" tab within the deployed Web UI (http://localhost:4040 for running applications or http://localhost:18080 for completed applications, both by default) for each application executed, where graphs representative of application performance could be obtained, which is no more available using Spark Structured Streaming. In my case, I am developing a streaming application with Spark Structured Streaming that reads from a Kafka broker and I would like to obtain a graph of records processed per second, such as the one I could obtain when using Spark Streaming instead of Spark Structured Streaming, among other graphical information.

What is the best alternative to achieve this? I am using Spark 3.0.1 (via pyspark library), and deploying my application on a YARN cluster.

I've checked Monitoring Structured Streaming Applications Using Web UI by Jacek Laskowski, but it is still not very clear how to obtain this type of information in a graphic way.

Thank you in advance!


Solution

  • I managed to get what I wanted. For some reason I still don't know, the Spark History Server UI for completed apps (on http://localhost:18080 by default) did not show the new tab ("Structured Streaming" tab) that is available for Spark Structured Streaming applications that are executed on Spark 3.0.1. However, the web UI that I managed to access through the URL http://localhost:4040 does show me the information that I wanted to retrieve. You just need to click on the 'runId' link of the streaming query from which you want to get the statistics.

    Spark Structured Streaming app Web UI on port 4040

    If you can't see this tab, based on my personal experience, I recommend the following:

    • Upgrade to Spark latest version (currently 3.0.1)
    • Consult this information on the UI deployed at port 4040 while the application is running, instead of port 18080 when the application has finished.

    I found the Web UI official documentation from latest Apache Spark very useful to achieve this.