kubernetes google-cloud-platform architecture google-kubernetes-engine system-design

How would you make sure your GKE clusters are close to your end clients?

If I'm building an app and deploying it to a GKE cluster, but serve users from multiple regions, how do I minimize latency from users to my cluster?

Do I have to:

Deploy the same application to different clusters in different regions?
Use a load balancer with GKE as a backend service?

Or is there any setting while deploying to make sure my cluster has minimal latency from a multi-region perspective?

Additionally, If I'm running separate frontend and backend applications. I assume the best practice would be to keep the frontend separate from the backend in two different clusters or in the same cluster and different pods?

Solution

You should deploy the both frontend and backend application into different kubernetes clusters in different data-centers located in different regions. You can use ingress to setup Google Cloud Load Balancer which can handle cross region traffic for multi cluster Kubernetes environemnt.

You should use deployment to deploy multiple replicas of your pods.Additionally you can use podAffinity to colocate frontend pod and backend pod on same worker node.

https://cloud.google.com/blog/products/gcp/how-to-deploy-geographically-distributed-services-on-kubernetes-engine-with-kubemci

https://cloud.google.com/solutions/prep-kubernetes-engine-for-prod