Search code examples
hadoopmapreducehadoop-yarnhortonworks-data-platform

What does Num Off Switch Containers mean in Yarn Resource Manager UI?


I have an ETL job taking up lot of CPU and Memory and running for a long time. The first thing I observed while debugging is the following (from the job on the resource manager GUI)

  • Num Node Local Containers (satisfied by) = 6
  • Num Rack Local Containers (satisfied by) = 00
  • Num Off Switch Containers (satisfied by) = 11367

We only have two racks. I need help with answering the following three questions

  1. What is the meaning of Num Off Switch Containers?
  2. How can I identify these "off switch" containers and which node(s) they ran on?
  3. Does off switch containers contribute to slow job processing times?

Solution

  • 1 .What is the meaning of Num Off Switch Containers? The above case is of off switch locality by Delay Scheduler :-

                    | router|
                  +-----------+
                 /             \
        +-----------+        +-----------+
        |rack switch|        |rack switch|
        +-----------+        +-----------+
        | data node |        | data node |
        +-----------+        +-----------+
        | data node |        | data node |
        +-----------+        +-----------+
    

    It is the worst scenario of data locality (1.Node local 2. Rack local 3. off switch) by Delay scheduler and the memory and vcores are getting allocated on different rack over the switch and much more higher bandwidths.

    The delay scheduler assigns the incoming task to an off-switch node which is located on a different rack to avoid task starvation

    For different schedulers in YARN config variables are there for node and rack threshold :-

    CAPACITY SCH :- With this property yarn.scheduler.capacity.rack-locality-additional-delay ,relaxing locality for off-switch container assignments is done.

    capacity-scheduler.xml

    <property>
      <name>yarn.scheduler.capacity.node-locality-delay</name>
      <value>-1</value>
      <description>Number of missed scheduling opportunities after which the              CapacityScheduler attempts to schedule rack-local containers.
    Typically this should be set to number of racks in the cluster, this
    feature is disabled by default, set to -1.
     </description>
    </property>
    

    https://issues.apache.org/jira/browse/YARN-4189 - JIRA for improvement In case if cluster using Fair scheduler :-

    https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

    yarn.scheduler.fair.locality.threshold.rack

    https://cs.stanford.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf for delay scheduling.

    2.How can I identify these "off switch" containers and which node(s) they ran on?

    To view containers for an application I believe we need to go to particular application attempt via attempt id and there we can find conatainer and its node. I did not find any direct link to off switch container in RM ui

    3.Does off switch containers contribute to slow job processing times?

    Yes from above we can conclude ,the network overhead will slow job processing.