I am trying to tinker with the YARN container allocation code. By container allocation, I mean the decision to place the container on a specific machine in the cluster.
I want to write my own container allocation code. To begin with, I am running Hadoop in pseudo-distributed mode with YARN. I am trying to locate the relevant points in the source code. So far, using print statements, I have been able to pinpoint the class hadoop-source-code/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationMasterProtocolPBClientImpl.java#allocate
where allocation takes place. However, I am unable to narrow it down further. Going further into this method, I have not been able to print anything.
To recap- I would like to locate the exact point in the Hadoop source code where I would need to write my own code to replace the existing container allocation mechnism.
I have not been able to print anything
At first, I thought logging is application specific but all information related to resource manager is under log file named hadoop-{username}-resourcemanager-{username}.log
under log
folder. Instead of print statement, I used LOG.info
for debugging.
Location of allocation mechanism in hadoop source code
I am using FIFO scheduler and allocation mechanism is under method FifoScheduler#assignContainersOnNode
which is called from FifoScheduler#assignContainers
which is called from FifoScheduler#nodeUpdate
method.
There is FifoScheduler#handle
method (more information here), which keeps on tracking of different events. NODE_UPDATE
is among one of those which is triggered often and hence assignment of container on given node takes place.