kubernetes autoscaling azure-aks horizontal-pod-autoscaling

Is PDB required for a kubernetes cluster for which HPA is defined?

I have a Kubernetes cluster. Through policies, I have made sure that all the services have Requests, Limits and HPA defined so that I can have smooth autoscaling. I have also defined ResourceQuota. In such a scenario, is there a need to define PDB as well? Please advise.

Solution

Pod Disruption Budgets (PDBs) are NOT required but are useful when working with Horizontal Pod Autoscaler. The HPA scales the number of pods in your deployment, while a PDB ensures that node operations won’t bring your service down by removing too many pod instances at the same time.

As the name implies, a Pod Disruption Budget defines how much disruption is acceptable. It defines either a minAvailable or maxUnavailable number of pods in the deployment. It looks at the number of running replicas (controlled by HPA when used with HPA) and uses the pod label selector (same as the service) to identify which pods the rules apply to.

Setting either a minAvailable or a maxUnavailable value depends on the application: a distributed system that needs a quorum would need the minAvailable to match the quorum size or the service will fail. Most application work well with a maxUnavailable set to 1 or more. A maxUnavailable of 1 will ensure pods are moved 1 at a time from a draining node to an available node. To move them faster, a larger value would be useful; that is if the scale of the replica set is significant enough to permits such disruption.

During a node operation (for example a node upgrade or a node pool scale down) where one or multiple nodes may become unavailable, the drain will halt and wait until all PDB rules are respected when performing pod eviction. Pods that, if evicted, would cause a PDB to be invalid would wait until the condition is valid. Note that this may prevent a node operation from completing. If a node is being upgraded, but there is not enough capacity, the eviction process would fail as there is no alternative node available to redeploy the pods that need to move. Without PDB, the node would drain and evict all pods running on it, potentially causing the number of pods in the deployment/replica set to fall under a critical threshold for the service to work.

Also note that PDBs may be necessary for daemonsets to prevent node operation from failing, since daemonsets run services on every node (that their label targets), a node might be prevented from shutting down because a daemonset pod is running on it, unless a disruption budget was defined to allow for such disruption.

For more details, have a look at this article:

https://blog.gruntwork.io/avoiding-outages-in-your-kubernetes-cluster-using-poddisruptionbudgets-ef6a4baa5085

And for full details on what the resource manifest is:

https://kubernetes.io/docs/tasks/run-application/configure-pdb/