Search code examples
javahadoopmapreducehadoop-yarnresourcemanager

Launch mapreduce job on hadoop 2.2 (Yarn) from java application


I'm trying to call a mapreduce job from a java application. In former hadoop versions (1.x) I created a Configuration object and a Job object, set mapred.job.tracker and fs.default.name in the Configuration and ran the Job.

Now, in hadoop 2.x the job tracker does not exist anymore neither exists there any documentation on how to programatically run MR jobs. Any ideas?

What I'm looking for is an explanation as given here: call mapreduce from a java program


Solution

  • You'll need three things:

    // this should be like defined in your yarn-site.xml
    conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 
    
    // framework is now "yarn", should be defined like this in mapred-site.xm
    conf.set("mapreduce.framework.name", "yarn");
    
    // like defined in hdfs-site.xml
    conf.set("fs.default.name", "hdfs://namenode.com:9000");
    

    Here is a more detailed explanation in the Hadoop 2.2.0 documentation.