Search code examples
hadoophadoop-yarn

Determining the number of reduce slots in Hadoop cluster


Using the Java API, how do I determine the current cluster's total number of reduce slots? (If I can get the number of slots currently in use, that would be a bonus.)

My use case: I have a Hadoop job that launches another Hadoop job. For the second job, I have to set the number of reducers. And this should be based on the number of slots available. Also, the cluster's size is subject to change.

I'm using Hadoop 2.7.3. And it normally runs on Amazon EMR, but I'd prefer a solution that just uses the Hadoop API.


Solution

  • You can use Java HTTP client to request cluster metrics from YARN using ResourceManager REST API.

    The response will be a JSON containing total, allocated, reserved and available memory and vcores on the cluster.

    $ curl -G -k https://<resource-manager-host>:8090/ws/v1/cluster/metrics
    {"clusterMetrics":      
    {"appsSubmitted":999999,"appsCompleted":999999,"appsPending":0,"appsRunning":99,"appsFailed":99,"appsKilled":999,
     "reservedMB":0,"availableMB":99999999,"allocatedMB":9999999,
    "reservedVirtualCores":0,"availableVirtualCores":9999,"allocatedVirtualCores":9999,
    "containersAllocated":9999,"containersReserved":0,"containersPending":999,
    "totalMB":9999999,"totalVirtualCores":99999,
    "totalNodes":999,"lostNodes":9,"unhealthyNodes":9,"decommissioningNodes":0,"decommissionedNodes":99,"rebootedNodes":0,"activeNodes":999}}
    $
    

    Not sure what you mean by "slots", because you can actually specify reducer container size when submitting MR job.