Search code examples
hadoopmapreduceaccumulo

Accumulo Write: Trying to create and insert data from mapreduce


I'm trying to write data into tables of Accumulo, using MapReduce. The following is my mapreduce code for accumulo.

Job job = Job.getInstance(conf);
AccumuloOutputFormat.setZooKeeperInstance(job, accumuloInstance, zooKeepers);
AccumuloOutputFormat.setDefaultTableName(job, accumuloTableName);
AccumuloOutputFormat.setConnectorInfo(job, accumuloUser, new PasswordToken(accumuloPassword));

On executing, i'm getting the following exception:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/thrift/TException
    at org.apache.accumulo.core.client.mapreduce.lib.util.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:107)
    at org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:94)
    at core.accumulo.mapreduce.AccumuloMapReduceWrite.main(AccumuloMapReduceWrite.java:96)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

How would i resolve this ? I have tried and refereed few URL's too. But it couldn't help enough.


Solution

  • It looks like your job classpath is missing the Thrift classes. I assume you are running your job using the standard hadoop job commands.

    In this case, you have two options:

    1. Manually add the necessary jars to your classpath. This is going to be thrift.jar, accumulo-start.jar, accumulo-core.jar, and possibly accumulo-trace.jar depending on your version. You'll want to specify these with the -libjars option, a more full read-up is available as a blog post.

    2. Use build in Apache Accumulo tools to launch your job. Most versions of Accumulo come with a launcher script called tool.sh that will automatically add the appropriate jars for you. It is usually found under $ACCUMULO_HOME/bin/tool.sh. Some distributions may call it something else, such as accumulo-tool, to disambiguate it from other tools. Examples can be seen in the user manual (third code block).