Search code examples
google-compute-engineoryx

Hadoop cluster on Google Compute Engine: Accessing master node via REST


I have deployed a hadoop cluster on google compute engine. I then run a machine learning algorithm (Cloudera's Oryx) on the master node of the hadoop cluster. The output of this algorithm is accessed via an HTTP REST API. Thus I need to access the output either by a web browser, or via REST commands. However, I cannot resolve the address for the output of the master node which takes the form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091.

I have allowed http traffic and allowed access to ports 80 and 8091 on the network. But I cannot resolve the address given. Note this http address is NOT the IP address of the master node instance.

I have followed along with examples for accessing IP addresses of compute instances. However, I cannot find examples of accessing a single node of a hadoop cluster on GCE, that follows this form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091. Any help would be appreciated. Thank you.


Solution

  • The reason you're seeing this is that the "HOSTNAME.c.PROJECT.internal" name is only resolvable from within the GCE network of that same instance itself; these domain names are not globally visible. So, if you were to SSH into your master node first, and then try to curl http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 then you should successfully retrieve the contents, whereas trying to access from your personal browser will just fail to resolve that hostname into any IP address.

    So unfortunately, the quickest way for you to retrieve those contents is indeed to use the external IP address of your GCE instance. If you've already opened port 8091 on the network, simply use gcutil getinstance CLUSTER_NAME-m and look for the entry specifying external IP address; then plug that in as your URL: http://[external ip address]:8091.

    If you turned up the cluster using bdutil, a more involved but nicer way to access your cluster is by running the bdutil socksproxy command. This opens a dynamic-port-forwarding SSH tunnel to your master node as a SOCKS5 proxy, so that you can then configure your browser to use localhost:1080 as your proxy server, make sure to enable remote DNS resolution, and then visit your browser using the normal http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091 URL.