Search code examples
javahadoopnutchmanticore-search

Apache Nutch Indexer Plugin to Manticore Search Exception: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException


I have created an Apache Nutch Indexer Plugin to push data to Manticore Search using Manticore Search Java API.

The build is successful and all the crawling steps before indexing are succeeding (inject, generate, fetch, parse, updatedb).

When I run the indexing command bin/nutch index /root/nutch_source/crawl/crawldb/ -linkdb /root/nutch_source/crawl/linkdb/ -dir /root/nutch_source/crawl/segments/ -filter -normalize -deleteGone it fails and logs/hadoop.log include the following stack trace.

I am running Nutch into a Docker container.

Nutch version in the image is 1.19

2021-09-07 10:15:46,040 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:16:23,666 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:17:36,020 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-07 10:17:36,378 INFO  segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906001900.
2021-09-07 10:17:36,383 INFO  segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906001655.
2021-09-07 10:17:36,387 INFO  segment.SegmentChecker - Segment dir is complete: file:/root/nutch_source/crawl/segments/20210906002358.
2021-09-07 10:17:36,391 INFO  indexer.IndexingJob - Indexer: starting at 2021-09-07 10:17:36
2021-09-07 10:17:36,401 INFO  indexer.IndexingJob - Indexer: deleting gone documents: true
2021-09-07 10:17:36,402 INFO  indexer.IndexingJob - Indexer: URL filtering: true
2021-09-07 10:17:36,402 INFO  indexer.IndexingJob - Indexer: URL normalizing: true
2021-09-07 10:17:36,403 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: /root/nutch_source/crawl/crawldb
2021-09-07 10:17:36,407 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906001900
2021-09-07 10:17:36,408 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906001655
2021-09-07 10:17:36,410 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: file:/root/nutch_source/crawl/segments/20210906002358
2021-09-07 10:17:36,411 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: /root/nutch_source/crawl/linkdb
2021-09-07 10:17:36,528 WARN  impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
2021-09-07 10:17:37,708 INFO  mapreduce.Job - The url to track the job: http://localhost:8080/
2021-09-07 10:17:37,711 INFO  mapreduce.Job - Running job: job_local250243852_0001
2021-09-07 10:17:38,724 INFO  mapreduce.Job - Job job_local250243852_0001 running in uber mode : false
2021-09-07 10:17:38,725 INFO  mapreduce.Job -  map 0% reduce 0%
2021-09-07 10:17:39,731 INFO  mapreduce.Job -  map 100% reduce 0%
2021-09-07 10:17:47,677 WARN  impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-09-07 10:17:47,992 INFO  indexer.IndexWriters - Index writer org.apache.nutch.indexwriter.manticore.ManticoreIndexWriter identified.
2021-09-07 10:17:48,013 WARN  mapred.LocalJobRunner - job_local250243852_0001
java.lang.Exception: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
Caused by: java.lang.NoClassDefFoundError: com/manticoresearch/client/ApiException
        at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
        at java.base/java.lang.Class.getConstructor0(Class.java:3342)
        at java.base/java.lang.Class.getConstructor(Class.java:2151)
        at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170)
        at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:97)
        at org.apache.nutch.indexer.IndexWriters.lambda$get$0(IndexWriters.java:60)
        at java.base/java.util.Map.computeIfAbsent(Map.java:1003)
        at org.apache.nutch.indexer.IndexWriters.get(IndexWriters.java:60)
        at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:41)
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: com.manticoresearch.client.ApiException
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        at org.apache.nutch.plugin.PluginClassLoader.loadClassFromSystem(PluginClassLoader.java:105)
        at org.apache.nutch.plugin.PluginClassLoader.loadClassFromParent(PluginClassLoader.java:93)
        at org.apache.nutch.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:73)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 19 more
2021-09-07 10:17:48,742 INFO  mapreduce.Job - Job job_local250243852_0001 failed with state FAILED due to: NA
2021-09-07 10:17:48,773 INFO  mapreduce.Job - Counters: 30
        File System Counters
                FILE: Number of bytes read=157397439
                FILE: Number of bytes written=332518016
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=51223
                Map output records=51223
                Map output bytes=24049558
                Map output materialized bytes=24158915
                Input split bytes=2010
                Combine input records=0
                Combine output records=0
                Reduce input groups=0
                Input split bytes=2010
                Combine input records=0
                Combine output records=0
                Reduce input groups=0
                Reduce shuffle bytes=24158915
                Reduce input records=0
                Reduce output records=0
                Spilled Records=51223
                Shuffled Maps =14
                Failed Shuffles=0
                Merged Map outputs=14
                GC time elapsed (ms)=125
                Total committed heap usage (bytes)=5221908480
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=11426452
        File Output Format Counters
                Bytes Written=0
2021-09-07 10:17:48,774 ERROR indexer.IndexingJob - Indexing job did not succeed, job status:FAILED, reason: NA
2021-09-07 10:17:48,776 ERROR indexer.IndexingJob - Indexer: java.lang.RuntimeException: Indexing job did not succeed, job status:FAILED, reason: NA
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:152)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:293)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:302)

Solution

  • I could resolve this issue by adding all the dependent libraries of ManticoreSearch to the plugin manifest plugin.xml file inside the plugin folder.

    I have found all the dependent JAR libraries listed in the folder runtime/local/plugins/<plugin-name>/ and took the name and included it under <runtime> tag of the plugin.xml.

    After rebuilding the solution the indexer worked!