Search code examples
google-cloud-dataproc

External IP of Google Cloud Dataproc cluster changes after cluster restart


There is an option for google cloud dataproc to stop(Not delete) the cluster (Master + Worker nodes) and start as well but when we do so, external IP address of master and worker nodes are changing which causes problem for using Hue and other IP based Web UI on it.

Is there any option to persist the same IP after restart?


Solution

  • Though Dataproc doesn't currently provide a direct option for using static IP addresses, you can use the underlying Compute Engine interfaces to add a static IP address to your master node, possibly removing the previous "ephemeral IP address".

    That said, if you're accessing your UIs through external IP addresses, that presumably means you also had to manage your firewall rules to carefully limit the inbound IP ranges. Depending on what UIs you're using, if they're not using HTTPS/SSL then that's still not ideal even if you have firewall rules limiting access from other external sources.

    The recommended way to access your Dataproc UIs is through SSH tunnels; you can even add the gcloud compute ssh and browser-launching commands to a shell script for convenience if you don't want to re-type all the SSH flags each time. This approach would also ensure that links work in pages like the YARN ResourceManager, since those will be using GCE internal hostnames which your external IP address would not work for.