out-of-memory google-cloud-platform hadoop-yarn google-cloud-dataproc

YARN not using all memory available in a Google Dataproc instance

I'm running a Dataproc job using h1-highmem-16 machines which have 104 GB of memory each.

I double-checked the size of the instances in the Google Console and all workers and the master are indeed h1-highmem-16.

Nevertheless, I get this error:

Container killed by YARN for exceeding memory limits. 56.8 GB of 54 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

Why is YARN not using all 104 GB of memory?

Solution

Dataproc configures memory settings to fit 2 Spark executors per machine, so each container should be expected to be half the capacity of each NodeManager.

You can optionally override spark.executor.memory and maybe spark.yarn.executor.memoryOverhead along with spark.executor.cores to change how you want to pack executors onto each machine. spark.executor.cores will default to half the machine's cores, since half the machine's memory is given to each executor. In your case this means each Spark executor tries to run 8 tasks in parallel in the same process.

You can effectively increase memory-per-task by reducing executor cores but keeping everything else the same, for example spark.executor.cores=6 will increase per-task memory by 33% even if you leave everything else the same. These can be specified at job-submission time:

gcloud dataproc jobs submit spark --properties spark.executor.cores=6