wrapping my head around topologySpreadConstraints but I am stuck at this point. All 2-Pod pairs of my 7 deployments got scheduled in the same zone europe-west1-b. Not a single deployment was distributed among 2 zones. It is important to mention that scheduling in different zones has generally worked before. When spinning up 2 or more replicas without topologySpreadConstraints Pods are scheduled randomly across zones (or in same zone respectively). Also important: I already have 3 "placeholder" deployments, forcefully scheduling each of them in zones b, c and d respectively, i. e. my cluster is guaranteed to have nodes in all 3 zones.
My manifests now all look like this (service name differs; containers has been stripped out):
apiVersion: apps/v1
kind: Deployment
metadata:
name: some-microservice
namespace: some-namespace
labels:
app: some-microservice
spec:
replicas: 2
selector:
matchLabels:
app: some-microservice
template:
metadata:
labels:
app: some-microservice
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: some-microservice
nodeSelector:
cloud.google.com/gke-spot: "true"
I also tried "zone" instead of "topology.kubernetes.io/zone" since this is the way its documented on kubernetes.io, however then no 2nd Pod is getting scheduled at all.
Solved!
I mentioned that I got 3 "placeholder" deployments scheduled in zones b, c and d. That was indeed the case, but I forgot to add the cloud.google.com/gke-spot: "true"
nodeSelector and hence, non-spot nodes got created. This lead to the issue that Kubernetes scheduler indeed could not find any suitable node in zones c and d of type "spot".
This circumstance once again underlines the fact that these "node placeholder" deployments are needed in GKE Autopilot to guarantee the presence of nodes in all zones.