I'm trying to get nutch 2.3 work with mongoDB but I get the following exception:
java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:284)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:185)
I've found the following ticket related to this problem, which says it should be resolved in nutch 2.3: https://issues.apache.org/jira/browse/NUTCH-1843
There's another ticket for the Gora project which says this issue is actually resolved in Gora 0.6 which can be found in https://issues.apache.org/jira/browse/GORA-388 . However Nutch 2.3 uses gora 0.5. So I don't see how this issue would be resolved in nutch 2.3.
I really would like to use MongoDB, but I can't seem to overcome the issue. Is there anyone who has insight into this problem? Is it a configuration issue?
The solution is to apply the following patch: https://issues.apache.org/jira/browse/NUTCH-1946 to your project. This patch updates gora to 0.6, which contains the fix for this problem.
If you run into a RuntimeException during the GeneratorJob, please add the following to your nutch-site.xml
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization</value>
<description>A list of serialization classes that can be used for
obtaining serializers and deserializers.</description>
</property>