Search code examples
hadoopmapreducebenchmarkingcloudera-cdhcloudera-manager

MapReduce job stopped executing


I would like to run a TeraSort based benchmark test on Hadoop cluster. Script is working, firstly it is in running state, but after a few minutes it stuck in Accepted state and FinalStatus Undefined. I have thought, it is maybe a resource problem so I have modified yarn-site.xml like above.

<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>8192</value>
  <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>

<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>2048</value>
</property>

Same problem again. You can see also some graphs above about this process. It stops, when job's progress bar is on ~9-15 %.

Cloudera Manager dashboard


Solution

  • Please verify the values set for these parameters too:

    yarn.scheduler.maximum-allocation-mb, mapreduce.map.memory.mb, mapreduce.map.java.opts, mapreduce.reduce.memory.mb, mapreduce.reduce.java.opts

    Start with yarn.scheduler.minimum-allocation-mb to be 512MB. This will help the scheduler to allocate memory to the tasks in smaller increments.

    Update 1: This link would help understand a few things: https://www.mapr.com/blog/best-practices-yarn-resource-management

    Also, set the input split size as appropriate to your environment. For optimum read performance, your input split size and block size should be same.