Search code examples
hadoopremote-accesshigh-availabilitynameservice

Hadoop HA Namenode remote access


Im configuring Hadoop 2.2.0 stable release with HA namenode but i dont know how to configure remote access to the cluster.

I have HA namenode configured with manual failover and i defined dfs.nameservices and i can access hdfs with nameservice from all the nodes included in the cluster, but not from outside.

I can perform operations on hdfs by contact directly the active namenode, but i dont want that, i want to contact the cluster and then be redirected to the active namenode. I think this is the normal configuration for a HA cluster.

Does anyone now how to do that?

(thanks in advance...)


Solution

  • You need to contact one of the Name nodes (as you're currently doing) - there is no cluster node to contact.

    The hadoop client code knows the address of the two namenodes (in core-site.xml) and can identity which is the active and which is the standby. There might be a way by which you can interrogate a zookeeper node in the quorum to identify the active / standby (maybe, i'm not sure) but you might as well check one of the namenodes - you have a 50/50 chance it's the active one.

    I'd have to check, but you might be able to query either if you're just reading from HDFS.