Search code examples
web-crawlernutchscoringnutch2

Apache Nutch 2.3.1 opic scoring filter not working


I have configured Nutch 2.3.1 with complete Hadoop/Hbase ecosystem on a small cluster. I am curious about scoring algorithm used in Nutch. I have found and used opic scoring filter in Nutch. To find its impect, I have check score at different steps in Nutch IN ( dbupdate and generate phase) as guided in Nutch WIKI. But I have found that every document score always remain zero no matter how may iteration I run and how many documents I fetch. Is there some problem in opic implementation or I am missing some of its configuration.

I have observed that _csh_ field that contains cash is removed at fetcher phase from corresponding table in Hbase.


Solution

  • I had resolved it by putting the changes in OPICScoringFilter.java

    src/plugin/scoring-opic/src/java/org/apache/nutch/scoring/opic/OPICScoringFilter.java

    I've put it in Markers as UTF8.

    -    row.getMetadata().put(CASH_KEY, ByteBuffer.wrap(Bytes.toBytes(score)));
    +    row.getMarkers().put(CASH_KEY, new Utf8(Double.toString(score)));