If I'm building an app and deploying it to a GKE cluster, but serve users from multiple regions, how do I minimize latency from users to my cluster?
Do I have to:
Or is there any setting while deploying to make sure my cluster has minimal latency from a multi-region perspective?
Additionally, If I'm running separate frontend and backend applications. I assume the best practice would be to keep the frontend separate from the backend in two different clusters or in the same cluster and different pods?
You should deploy the both frontend and backend application into different kubernetes clusters in different data-centers located in different regions. You can use ingress to setup Google Cloud Load Balancer which can handle cross region traffic for multi cluster Kubernetes environemnt.
You should use deployment to deploy multiple replicas of your pods.Additionally you can use podAffinity to colocate frontend pod and backend pod on same worker node.
https://cloud.google.com/solutions/prep-kubernetes-engine-for-prod