Search code examples
mongodbnutchgora

Gora MongoDb Exception, can't serialize Utf8


I'm trying to get nutch 2.3 work with mongoDB but I get the following exception:

java.lang.IllegalArgumentException: can't serialize class org.apache.avro.util.Utf8
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:284)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:185)

I've found the following ticket related to this problem, which says it should be resolved in nutch 2.3: https://issues.apache.org/jira/browse/NUTCH-1843

There's another ticket for the Gora project which says this issue is actually resolved in Gora 0.6 which can be found in https://issues.apache.org/jira/browse/GORA-388 . However Nutch 2.3 uses gora 0.5. So I don't see how this issue would be resolved in nutch 2.3.

I really would like to use MongoDB, but I can't seem to overcome the issue. Is there anyone who has insight into this problem? Is it a configuration issue?


Solution

  • The solution is to apply the following patch: https://issues.apache.org/jira/browse/NUTCH-1946 to your project. This patch updates gora to 0.6, which contains the fix for this problem.

    If you run into a RuntimeException during the GeneratorJob, please add the following to your nutch-site.xml

    <property>
        <name>io.serializations</name>
        <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
        <description>A list of serialization classes that can be used for
            obtaining serializers and deserializers.</description>
    </property>