Search code examples
hadoopapache-tikabehemoth

Error in configuring object when converting intoTika using Behemoth and map reduce


I am running the command to convert behemoth corpus to tika using map reduce as given in this tutorial

I am getting following error on doing it:

    13/02/25 14:44:00 INFO mapred.FileInputFormat: Total input paths to process : 1
13/02/25 14:44:01 INFO mapred.JobClient: Running job: job_201302251222_0017
13/02/25 14:44:02 INFO mapred.JobClient:  map 0% reduce 0%
13/02/25 14:44:09 INFO mapred.JobClient: Task Id : attempt_201302251222_0017_m_000000_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201302251222_0017_m_000001_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201302251222_0017_m_000001_0: log4j:WARN Please initialize the log4j system properly.
13/02/25 14:44:14 INFO mapred.JobClient: Task Id : attempt_201302251222_0017_m_000001_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)

I am not able to understand the exact problem.What could be the possible reasons?Do i need to add copy any jar from Behemoth/Tika to hadoop working directory?


Solution

  • I had the same problem. The procedure, as described on this page has helped me. After I run "mvn clean install", the tika job worked as described in the tutorial.