Search code examples
apache-sparkhadoop-yarngoogle-cloud-dataproc

Example of Yarn Queues in Dataproc (Spark v2)


Has anyone been able to add more than the default queue to yarn on Spark 2.x in Dataproc?

Attempts that fail at cluster creation time:

capacity-scheduler:yarn.scheduler.capacity.root.queues=alpha,beta,default yarn:yarn.scheduler.capacity.root.queues=alpha,beta,default

Additionally, setting yarn.scheduler.fair.allow-undeclared-pools=true on either of the above configuration prefixes to activate dynamic queues also fails.

All cases seem to make the daemon fail leaving the Resource Manager dead on launch.


Solution

  • Each queue needs to have a capacity specified. The properties needed for your example are as follow:

    capacity-scheduler:yarn.scheduler.capacity.root.queues=alpha,beta,default
    capacity-scheduler:yarn.scheduler.capacity.root.alpha.capacity=20
    capacity-scheduler:yarn.scheduler.capacity.root.beta.capacity=20
    capacity-scheduler:yarn.scheduler.capacity.root.default.capacity=60
    

    Where all capacities specified sum to 100% of the root queues resources. The full set of configuration options for the capacity scheduler can be found in Hadoop documentation.