Search code examples
hadoopmapreduce

Uber mode configuration settings aligned but jobs do not execute in uber mode


According to documentation from Hortonworks, the way to execute Hadoop jobs in "uber mode", is to configure one's maprep-site.xml settings like so:

<configuration>
  <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
  </property>
  <property>
     <name>mapreduce.job.ubertask.enable</name>
     <value>true</value>
  </property>
  <property>
     <name>mapreduce.job.ubertask.maxmaps</name>
     <value>1</value>
  </property>
  <property>
     <name>mapreduce.job.ubertask.maxreduces</name>
     <value>1</value>
  </property>
  <property>
     <name>mapreduce.job.ubertask.maxbytes</name>
     <value>134217728</value>
  </property>
</configuration>

For mapreduce.job.ubertask.maxbytes, I didn't really know what to put, I copied it from the dfs.block.size parameter in hdfs-site.xml and full disclosure I didn't really know what value to put there.

<property> 
    <name>dfs.block.size</name> 
    <value>134217728</value> 
    <description>Block size</description> 
</property>

Initially that block size was allocated according to my hunch that one of the reasons my job was failing was that the input data- which needs to be atomic (in the sense that it can't be broken up and fed into the mapper piecemeal)- was being split up in HDFS.

So nevertheless, despite the fact that these settings have been configured in such a way that the Hortonworks documentation, and others, would have one believe is sufficient to execute the job in "uber mode", the job does not in fact execute in that mode, as you can see below:

enter image description here

Is there something wrong with the settings as I've configured them that is preventing my job from executing in uber mode?


Solution

  • Those configuration settings in the OP are OK- the thing about uber mode is that you can only have a single input file, not multiple- as it was before. See here:

    17/10/12 20:42:42 INFO input.FileInputFormat: Total input files to process : 1
    17/10/12 20:42:43 INFO mapreduce.JobSubmitter: number of splits:1
    17/10/12 20:42:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1507833515636_0005
    17/10/12 20:42:44 INFO impl.YarnClientImpl: Submitted application application_1507833515636_0005
    17/10/12 20:42:44 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1507833515636_0005/
    17/10/12 20:42:44 INFO mapreduce.Job: Running job: job_1507833515636_0005
    17/10/12 20:42:49 INFO mapreduce.Job: Job job_1507833515636_0005 running in uber mode : true
    

    or, straight from the horse's mouth:

    enter image description here