I configured system and user pool on Azure AKS instances. I follow this guide:
before the activity we only had system type pools for applications and system pods as well.
I did the following steps:
creation of a system type pool and set of the following taint "CriticalAddonsOnly = true: NoSchedule" (to avoid deployment on the system pool for application microservices)
conversion of old pools from system to users
restart the following deployments:
gatekeeper-system:
kube-system:
to allow the scheduling of system pods also on the pool system since they are not automatically scheduled after pool creation.
Now i'm noticing that the system pods have now been scheduled on the pool system as well but I keep seeing the same pods on all other nodes. Even if I brutally delete them from the user pools, they are immediately redeployed on them. Is the behavior correct? Logically if I have a pool system all pods should only be on that pool and none on the user pool?
Thanks
As per Microsoft official documentation, these are the some features of user node pool and system node pool.
System Node Pool:
Must be running Linux.
They can have a minimum of 1 node, but it is recommended to have 2 nodes or 3 if it is your only Linux node pool.
They only support AKS cluster running on Virtual Machine Scale Sets.
The nodes need at least 2 vCPUs and 4GB memory.
They need to support at least 30 pods.
Cannot be made up of Spot VM’s.
Can have multiple system node pools.
If only one system node pool, it cannot be deleted.
Can be changed to a user node pool if you have another system node pool.
User Node Pool:
User node pools can be either Linux or Windows.
Can scale down to 0 nodes.
Can be deleted with no issues.
Spot VM’s can be used
Can be changed to a system node pool.
Can have as many user node pols as Azure will let you.
As per pod definitions, system pods are bound to be scheduled on system node pool unless controlled by DaemonSet. If a system pod is controlled by DaemonSet, it is bound to be scheduled to on every node present in a cluster regardless of pool type. My cluster has 4 nodes. 2 systems, 2 user. So these system pods exist in kube-system namespace have replicas each for one node.
kubectl get ds -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ama-logs 4 4 4 4 4 <none> 14d
azure-cni-networkmonitor 4 4 4 4 4 <none> 540d
azure-ip-masq-agent 4 4 4 4 4 <none> 540d
kube-proxy 4 4 4 4 4 <none> 540d
To further controll the behaviour of application pod to be not scheduled on system pool. You can add tain on System node pool by this and all application pods will be only scheduled on user node pool.
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name systempool \
--node-count 3 \
--node-taints CriticalAddonsOnly=true:NoSchedule \
--mode System