I have a 4 node hadoop distributed cluster (including hbase) set up like this.
Cluster set up seems to be fine , as all the WEB UIs (hbase, namenode, resource manager ) are coming up . Now when I am trying to submit a mapreduce job which reads/writes hbase tables , it gets hanged. It keeps getting timeedout However same job is working fine , in case I explicitly mention hbase credentials in my mapreduce code and set them in job
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "10.211.55.101");
conf.set("hbase.zookeeper.property.clientPort","2181");
conf.set("hbase.master", "10.211.55.101:60000");
//10.211.55.101 is the ipaddress of node1
These properties are already set in hbase configuration on node1 , node3 and node4. Now my question is Do I need to set up anything with regards to hbase configs on node2 where only resource manager is running ? Why the same job is working fine when hbase configs are set in code explicity
HBaseConfiguration.create() method loads the configurations in hbase-site.xml. Make sure you have hbase-site.xml available in the classpath of Node2.
Below is specified in HBase documentation here
The configuration used by a Java client is kept in an HBaseConfiguration instance. The factory method on HBaseConfiguration, HBaseConfiguration.create();, on invocation, will read in the content of the first hbase-site.xml found on the client's CLASSPATH, if one is present (Invocation will also factor in any hbase-default.xml found; an hbase-default.xml ships inside the hbase.X.X.X.jar)