Search code examples
hadoophadoop-yarn

yarn reallocate remaining compute power dynamically


I wonder how to configure dynamic queues for yarn: assuming there are 2 queues

  1. A (high performance, 70% of cluster)
  2. B (normal, rest=30% of the cluster)

I noticed that B items will only stick to their allocated resources, even if the other 70% are empty. How can I reallocate these resources (in case there are no A jobs) to complete B jobs quicker?


Solution

  • The capacity scheduler doc https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html makes some points about elasticity and resource preemption among queues.

    Elasticity - Free resources can be allocated to any queue beyond its capacity. When there is demand for these resources from queues running below capacity at a future point in time, as tasks scheduled on these resources complete, they will be assigned to applications on queues running below the capacity (preemption is also supported). This ensures that resources are available in a predictable and elastic manner to queues, thus preventing artificial silos of resources in the cluster which helps utilization.


    It also specifies the configuration parameters about the elasticity of queues and preemption of resources/containers like:

    yarn.scheduler.capacity.[queue-path].capacity - Queue capacity in percentage (%) as a float (e.g. 12.5). The sum of capacities for all queues, at each level, must be equal to 100. Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.

    About preemption

    The CapacityScheduler supports preemption of container from the queues whose resource usage is more than their guaranteed capacity.

    There are a number of more parameters listed on that page which you should be looking at to make a good configuration.