Search code examples
apache-sparkhadoophadoop-yarnambari

Yarn Capacity Scheduler: Share resource between users and queues


I am having some trouble settings the following scheduler queues params:

have 2 queue Dev and Prod

  • Root 100%

    • Dev 30%

    • Prod 70%

(if only one used it should act as 100% of cluster)

Each queue is used by multiple users and resources should be shared equally, but when only one user exists(in each queue) it should use the entire capacity of the queue. And if the user alone in the cluster it should use 100% of the cluster in case of second user join, the scheduler should share the available resources

what i have now, example flow:

  1. cluster is free of jobs

  2. user A submit job at queue Dev. (it now uses 100% of the cluster)

  3. user B submit job at queue Dev (it hangs in accepted and wait to the first job to finish)

What i want:

In this case because the second job is in the same queue each should receive 50% of the queue which is 100% of the cluster.

then if another job enters to Prod queue ( 2 jobs on Prod will share 70% (35% each) and one job on Dev will have 30% )

in another case if the job enters to the other queue (1 each total), the capacity should be 30,70

based on Apache Ambari

Version 2.6.1.5


Solution

  • The job B will have to wait for job A to complete. As far as I know, there’s no way to redistributed the load on the same YARN queue.

    The production jobs will be prioritized if you have enabled preemption (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_yarn-resource-management/content/preemption.html) and should take indeed 70% of the resources. As for the dev queue, first in, first serve..