Search code examples
hadoopweb-crawlernutchnutch2

Apache Nutch 2.3.1, increase reducer memory


I have setup a small size cluster if Hadoop with Hbase for Nutch 2.3.1. The hadoop version is 2.7.7 and Hbase is 0.98. I have customized a hadoop job and now I have to set memory for reducer task in driver class. I have come to know, in simple hadoop MR jobs, you can use JobConf method setMemoryForReducer. But there isn't any option available in Nutch. In my case , currently, reducer memory is set to 4 GB via mapred-site.xml (Hadoop configuration). But for Nutch, I have to double it.

Is it possible without changing hadoop conf files, either via driver class or nutch-site.xml


Solution

  • Finally, I was able to found the solution. NutchJob does the objective. Following is the code snippet

    NutchJob job = NutchJob.getInstance(getConf(), "rankDomain-update");
    
    int reducer_mem = 8192;
    String memory = "-Xmx" + (int) (reducer_mem * 0.8)+ "m";
    job.getConfiguration().setInt("mapreduce.reduce.memory.mb", reducer_mem);
    job.getConfiguration().set("mapreduce.reduce.java.opts", memory );
    // rest of code below