we have 10 node manager nodes that co-hosted with data nodes
the available Vcore on nodes described as the following
Vcore used Vcore Avilble
node manager 1 56 6
node manager 2 35 1
node manager 3 22 40
node manager 4 34 2
node manager 5 36 0
node manager 6 34 0
node manager 7 34 2
node manager 8 36 2
node manager 9 35 1
node manager 10 33 18
And total Vcore used and Vcore total are as the following
Vcore total Vcore Used
510 440
lets say we run spark structured streaming application with 5 executers when each executer consume 5 core
according to Table 1 , application should run properly when its run on node manager 3
machine in that case since application consume 5 X 5 = 35 core , then on node manager 3
we should be with available 15 Cores
but the Question is - dose yarn know which machine have the enough available cores to run the application ?
or maybe yarn run the executers on random different machines without to know which is the machine with the available cores?
YARN has a master component called as Resource Manager, whenever we submit spark application to YARN cluster, resource manager works with services such as Node Manager, Application Master and Scheduler to manage resource required for application.
Resource manager will be aware of resource availability on each node however allocation of resources is based on other factors like current resource state on each node, number of application running, application configurations and YARN configurations.
Resource Manager will not randomly select nodes for job execution, it will try to assign tasks to nodes which has enough resources. In your case since node 3 has enough resources which is close to 35 cores, node 3 will mostly likely to be choose.