java hadoop mapreduce hadoop-yarn resourcemanager

Launch mapreduce job on hadoop 2.2 (Yarn) from java application

I'm trying to call a mapreduce job from a java application. In former hadoop versions (1.x) I created a Configuration object and a Job object, set mapred.job.tracker and fs.default.name in the Configuration and ran the Job.

Now, in hadoop 2.x the job tracker does not exist anymore neither exists there any documentation on how to programatically run MR jobs. Any ideas?

What I'm looking for is an explanation as given here: call mapreduce from a java program

Solution

You'll need three things:

// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 

// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

Here is a more detailed explanation in the Hadoop 2.2.0 documentation.