Search code examples
hadoopmapreducecluster-computingresource-utilization

Find out the resource utilization of every node and distribute load equally in a cluster


I want to find out the resource utilization (CPU,RAM) and the Data processing taking place at every node in the Hadoop cluster.

Is there any way using MapReduce or HDFS commands to find out the load distributed across each node ?

Also, if one node is busy (overloaded) and another node bears little load, is there any way in Hadoop to distribute the excess load to the node which is idle ?


Solution

  • As per YARN , a container is logical execution unit template calculated based on the resources (cpu, memory) available from every node of the cluster. The number of the containers calculated across the cluster defines the maximum parallel execution capability of the cluster.

    You may sense the maximum utilization of the containers from http://<rm>:8088/cluster/nodes page, and you might refer the containers running and memory used from the first box of the page, refer below. enter image description here

    To see the number of containers prepared and the memory related metrics on every node , see the second box in the same page and you would never need any command line tools for this, refer below, enter image description here

    The YARN rm is already intelligent enough which will perfectly balances the load across the cluster considering the resource utilization on every node.

    So if one node is much busy, the yarn rm scheduler will decide another node which is much closer to the node which has the input split considering the rack awareness policy.

    You may go through the YARN:Anatomy of a mapreduce job in the Hadoop definitive guide book.