Search code examples
hadoopmapreducehbasebigdatadistributed-computing

Hanging Mapreduce job while reading hbase tables


I have a 4 node hadoop distributed cluster (including hbase) set up like this.

  • node1- namenode + hbase master + zookeeper
  • node2- resourcemanager
  • node3- datanode1+hbase regionserver1+nodemanager
  • node4- datenode2+hbase regionserver2+nodemanager

Cluster set up seems to be fine , as all the WEB UIs (hbase, namenode, resource manager ) are coming up . Now when I am trying to submit a mapreduce job which reads/writes hbase tables , it gets hanged. It keeps getting timeedout However same job is working fine , in case I explicitly mention hbase credentials in my mapreduce code and set them in job

Configuration conf =  HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "10.211.55.101");
conf.set("hbase.zookeeper.property.clientPort","2181");
conf.set("hbase.master", "10.211.55.101:60000");
//10.211.55.101 is the ipaddress of node1

These properties are already set in hbase configuration on node1 , node3 and node4. Now my question is Do I need to set up anything with regards to hbase configs on node2 where only resource manager is running ? Why the same job is working fine when hbase configs are set in code explicity


Solution

  • HBaseConfiguration.create() method loads the configurations in hbase-site.xml. Make sure you have hbase-site.xml available in the classpath of Node2.

    Below is specified in HBase documentation here

    The configuration used by a Java client is kept in an HBaseConfiguration instance. The factory method on HBaseConfiguration, HBaseConfiguration.create();, on invocation, will read in the content of the first hbase-site.xml found on the client's CLASSPATH, if one is present (Invocation will also factor in any hbase-default.xml found; an hbase-default.xml ships inside the hbase.X.X.X.jar)