Search code examples
hbasetalendmapr

Talend tHBASEConnection and tHBaseInput for MapR


I have access to an edge node to a MapR Hadoop cluster. I have an HBase table named /app/SubscriptionBillingPlatform/Matthew with some fake data. A scan of it in the hbase shell results in this:

enter image description here

I have a very simple Talend Job that should scan the table and log each row:

enter image description here

Here is the configuration for the tHBaseConnection. I obtained the zookeeper quorum and client port from the /opt/mapr/hbase/hbase-0.94.13/conf/hbase-site.xml file:

enter image description here

And here is the configuration for the tHBaseInput:

enter image description here

However, when I SCP the jar file after building/exporting the job and running it on the edge node, I get the following error:

14/08/06 15:51:26 INFO mapr.TableMappingRulesFactory: Could not find MapRTableMappingRules class, assuming HBase only cluster.
14/08/06 15:51:26 INFO mapr.TableMappingRulesFactory: If you are trying to access M7 tables, add mapr-hbase jar to your classpath.
14/08/06 15:51:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/06 15:51:26 INFO security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based
...
Exception in component tHBaseInput_1
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for /app/SubscriptionBillingPlatform/Matthew,,99999999999999 after 10 tries.
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:991)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:896)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:998)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:900)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:857)
        at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:257)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:187)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:142)
        at poc2.testhbaseoperations_0_1.TestHBaseOperations.tHBaseInput_1Process(TestHBaseOperations.java:752)
        at poc2.testhbaseoperations_0_1.TestHBaseOperations.tHBaseConnection_1Process(TestHBaseOperations.java:375)
        at poc2.testhbaseoperations_0_1.TestHBaseOperations.runJobInTOS(TestHBaseOperations.java:1104)
        at poc2.testhbaseoperations_0_1.TestHBaseOperations.main(TestHBaseOperations.java:993)

When I told the sys admins about this, who don't know what Talend is, they told me that MapR doesn't use HRegionServers like Cloudera does, and figured that my Talend configurations were wrong.

Any ideas?


Solution

  • The kicker was these two lines:

    INFO mapr.TableMappingRulesFactory: Could not find MapRTableMappingRules class, assuming HBase only cluster.
    mapr.TableMappingRulesFactory: If you are trying to access M7 tables, add mapr-hbase jar to your classpath.
    

    If the job doesn't have the mapr-hbase jar on the classpath, it will attempt to submit the job to regular HBase, not MapR-DB. This is why it hangs forever.

    You can either add the mapr-hbase jar from /opt/mapr/lib to the classpath on the shell script, or simply add all the jars from that directory to the classpath.

    #!/bin/sh
    cd `dirname $0`
     ROOT_PATH=`pwd`
    java -Xms256M -Xmx1024M -cp /opt/mapr/lib/*:$ROOT_PATH/..