Search code examples
apache-sparkmemorymemory-managementunified-memory

Spark execution memory monitoring


What I want is to be able to monitor Spark execution memory as opposed to storage memory available in SparkUI. I mean, execution memory NOT executor memory.

By execution memory I mean:

This region is used for buffering intermediate data when performing shuffles, joins, sorts and aggregations. The size of this region is configured through spark.shuffle.memoryFraction (default0.2). According to: Unified Memory Management in Spark 1.6

After intense search for answers I found nothing but unanswered StackOverflow questions, answers that relate only to storage memory or ones with vague answers of the type use Ganglia, use Cloudera console etc...

There seems to be a demand for this information on Stack Overflow, and yet not a single satisfactory answer is available. Here are some top posts of StackOverflow when searching monitoring spark memory

Monitor Spark execution and storage memory utilisation

Monitoring the Memory Usage of Spark Jobs

SPARK: How to monitor the memory consumption on Spark cluster?

Spark - monitor actual used executor memory

How can I monitor memory and CPU usage by spark application?

How to get memory and cpu usage by a Spark application?

Questions

Spark version > 2.0

  1. Is it possible to monitor Execution memory of Spark job? By monitoring I mean at minimum see used/available just like for storage memory per executor in Executor tab of SparkUI. Yes or No?

  2. Could I do it with SparkListeners (@JacekLaskowski ?) How about history-server? Or the only way is through the external tools? Graphana, Ganglia, others? If external tools, could you please point to a tutorial or provide some more detailed guidelines?

  3. I saw this SPARK-9103 Tracking spark's memory usage seems like it is not yet possible to monitor execution memory. Also this seems relevant SPARK-23206 Additional Memory Tuning Metrics.

  4. Does Peak Execution memory is reliable estimate of usage/occupation of execution memory in a task? If for example it a Stage UI says that a task uses 1 Gb at peak, and I have 5 cpu per executor, does it mean I need at least 5 Gb execution memory available on each executor to finish a stage?

  5. Are there some other proxies we could use to get a glimpse of execution memory?

  6. Is there a way to know when the execution memory starts to eat into storage memory? When my cached table disappears from Storage tab in SparkUI or only part of it remains, does it mean it was evicted by the execution memory?


Solution

  • Answering my own question for future reference:

    We are using Mesos as cluster manager. In the Mesos UI I found a page that lists all executors on a given worker and there one can find a Memory usage of the executor. It seems to be a total memory usage storage+execution. I can clearly see that when the memory fills up the executor dies.

    To access:

    • Go to Agents tab which lists all cluster workers
    • Choose worker
    • Choose Framework - the one with the name of your script
    • Inside you will have a list of executors for your job running on this particular worker.
    • For memory usage see: Mem (Used / Allocated)

    The similar can be done for driver. For a framework you choose the one with a name Spark Cluster

    If you want to know how to extract this number programatically see my response to this question: How to get Mesos Agents Framework Executor Memory