AKS cluster with multi-AZs: Nodes stuck during auto-upgrade due to PVC zone binding

I have an Azure Kubernetes Service (AKS) cluster that is designed for high availability. My cluster consists of a node pool that has nodes spread across three different availability zones. This setup is intentional because I aim to maximize the availability of the services running on the cluster.

I've enabled automated upgrades for the cluster to keep it up-to-date. However, I've noticed a recurring issue: during the upgrade process, some nodes get stuck. After investigating, I found that the issue arises because pods are scheduled on nodes that are not in the same availability zone as their associated Persistent Volume Claims (PVCs). From what I understand, within AKS, these persistent volumes are bound to a specific availability zone. This seems to be causing a conflict during the automated upgrade process, preventing certain nodes from being upgraded successfully.

How can I resolve the issue of nodes getting stuck during automated upgrades due to the PVC zone binding? Is there a way to make PVCs zone-agnostic so that they can be used by pods in any availability zone? Or should I maybe just create node pools with nodes all within the same availability zone? (And then create 3 node pools?).

Solution

So the answer is my last suggestion: don't make node pools with multiple availability zones. So for my setup, I created 3 node pools, called AZ1, AZ2 and AZ3. Then with auto upgrades, azure will always drain nodes per node pool, and prevent the PVCs from blocking the upgrade.