I have setup a small size cluster if Hadoop with Hbase for Nutch 2.3.1
. The hadoop version is 2.7.7 and Hbase is 0.98. I have customized a hadoop job and now I have to set memory for reducer task in driver class. I have come to know, in simple hadoop MR jobs, you can use JobConf
method setMemoryForReducer
. But there isn't any option available in Nutch. In my case , currently, reducer memory is set to 4 GB via mapred-site.xml
(Hadoop configuration). But for Nutch, I have to double it.
Is it possible without changing hadoop conf files, either via driver class or nutch-site.xml
Finally, I was able to found the solution. NutchJob
does the objective. Following is the code snippet
NutchJob job = NutchJob.getInstance(getConf(), "rankDomain-update");
int reducer_mem = 8192;
String memory = "-Xmx" + (int) (reducer_mem * 0.8)+ "m";
job.getConfiguration().setInt("mapreduce.reduce.memory.mb", reducer_mem);
job.getConfiguration().set("mapreduce.reduce.java.opts", memory );
// rest of code below