apache-spark hadoop-yarn google-cloud-dataproc

Example of Yarn Queues in Dataproc (Spark v2)

Has anyone been able to add more than the default queue to yarn on Spark 2.x in Dataproc?

Attempts that fail at cluster creation time:

capacity-scheduler:yarn.scheduler.capacity.root.queues=alpha,beta,default yarn:yarn.scheduler.capacity.root.queues=alpha,beta,default

Additionally, setting yarn.scheduler.fair.allow-undeclared-pools=true on either of the above configuration prefixes to activate dynamic queues also fails.

All cases seem to make the daemon fail leaving the Resource Manager dead on launch.

Solution

Each queue needs to have a capacity specified. The properties needed for your example are as follow:

capacity-scheduler:yarn.scheduler.capacity.root.queues=alpha,beta,default
capacity-scheduler:yarn.scheduler.capacity.root.alpha.capacity=20
capacity-scheduler:yarn.scheduler.capacity.root.beta.capacity=20
capacity-scheduler:yarn.scheduler.capacity.root.default.capacity=60

Where all capacities specified sum to 100% of the root queues resources. The full set of configuration options for the capacity scheduler can be found in Hadoop documentation.