Search code examples
apache-sparkpysparkapache-spark-sql

Spark executor memory overhead


From this blog, I understand there is reserved memory within each executor, which amounts to a constant 300MB. In the article as of Spark 1.6, the value of this reserved memory is said to be mutable but requires spark to be recompiled. In the spark config docs, there is spark.executor.memoryOverhead and this config was introduced as of Spark 2.3. Does this config determine the size of the reserved memory which was difficult to change in the Spark 1.6+ versions? If not what is this configuration used for?


Solution

    1. spark.executor.memoryOverhead does not determine the size of reserved memory.
    2. The total off-heap memory for a Spark executor is controlled by spark.executor.memoryOverhead.Until Spark 3.x, total off-heap memory was determined by this config, this included the space that would have been used by spark to cache dataframes. From Spark 3.x, this is not the case. The config only determines off-heap space used for string interning, JVM overheads and resource management, but not off-heap space that can be used to cache dataframes.

    References: https://medium.com/walmartglobaltech/decoding-memory-in-spark-parameters-that-are-often-confused-c11be7488a24 Difference between "spark.yarn.executor.memoryOverhead" and "spark.memory.offHeap.size"