As per the google docs , measuring High Availability of Dataproc based on HDFS & YARN availability not based on regions/zones . Is it possible to keep one master in one zone & another in different zone to get HA in context to Location ? Also please elaborate , whether configuring Dataproc cluster in Global Endpoint achieve HA in context to location ?
I have already gone through Google docs but that doesn't clear above doubts .
No, Dataproc HA does not guarantee regional availability, because all Dataproc cluster nodes should be in the same GCP zone.
To achieve regional availability you need to create Dataproc clusters in multiple zones and use Dataproc Workflow Templates with label-based cluster selectors to distribute job submission across zonal clusters.