hadoop not creating enough containers when more nodes are used

So I'm trying to run some hadoop jobs on AWS R3.4xLarge machines. They have 16 vcores and 122 gigabytes of ram available.

Each of my mappers requires about 8 gigs of ram and one thread, so these machines are very nearly perfect for the job.

I have mapreduce.memory.mb set to 8192, and mapreduce.map.java.opts set to -Xmx6144 This should result in approximately 14 mappers (in practice nearer to 12) running on each machine.

This is in fact the case for a 2 slave setup, where the scheduler shows 90 percent utilization of the cluster.

When scaling to, say, 4 slaves however, it seems that hadoop simply doesnt create more mappers. In fact it creates LESS.

On my 2 slave setup I had just under 30 mappers running at any one time, on four slaves I had about 20. The machines were sitting at just under 50 percent utilization.

The vcores are there, the physical memory is there. What the heck is missing? Why is hadoop not creating more containers?

Solution

So it turns out that this is one of those hadoop things that never makes sense, no matter how hard you try to figure it out.

there is a setting in yarn-default called yarn.nodemanager.heartbeat.interval-ms. This is set to 1000. Apparently it controls the minimum period between assigning containers in milliseconds.

This means it only creates one new map task per second. This means the number of containers is limited by how many containers I have running*the time that it takes for a container to be finished.

By setting this value to 50, or better yet, 1, I was able to get the kind of scaling that is expected from a hadoop cluster. Honestly should be documented better.