amazon-web-services kubernetes cron kubernetes-cronjob

Kubernetes, scaling cronjob pod to a different node

We have a single kubernetes cronjob, which task is to detect a newly uploaded file, and perform some operations on it. This operation runs every minute, and may take 10 minutes to complete.

At the moment it works, and it creates new pods for jobs as new files are detected. However, we would like a pod, created by the cronjob, to be spawned onto a different Node. At this stage, all of my pods are spawned in the same node, which may cause my EC2 instance to crash in a worst-case scenario where there are a lot of new files and my system runs our of memory.

I am using an EFS filesystem to share files amongst my different Nodes, so all nodes can read the uploaded files.

How do I go about letting new pods get spawned on different nodes using kubernetes cronjobs?

Solution

You can use Inter pod antiAffinity in the pod template section of the cronjob.Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: test
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - web-store
                topologyKey: "kubernetes.io/hostname"
          containers:
            - name: hello
              image: bash
              command: ["echo",  "Hello world"]
          restartPolicy: OnFailure

Necessary API docs

kubectl explain cronjob.spec.jobTemplate.spec.template.spec.affinity.podAntiAffinity
KIND:     CronJob
VERSION:  batch/v1beta1

RESOURCE: podAntiAffinity <Object>

DESCRIPTION:
     Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
     in the same node, zone, etc. as some other pod(s)).

     Pod anti affinity is a group of inter pod anti affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution  <[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     anti-affinity expressions specified by this field, but it may choose a node
     that violates one or more of the expressions. The node that is most
     preferred is the one with the greatest sum of weights, i.e. for each node
     that meets all of the scheduling requirements (resource request,
     requiredDuringScheduling anti-affinity expressions, etc.), compute a sum by
     iterating through the elements of this field and adding "weight" to the sum
     if the node has pods which matches the corresponding podAffinityTerm; the
     node(s) with the highest sum are the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution   <[]Object>
     If the anti-affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     anti-affinity requirements specified by this field cease to be met at some
     point during pod execution (e.g. due to a pod label update), the system may
     or may not try to eventually evict the pod from its node. When there are
     multiple elements, the lists of nodes corresponding to each podAffinityTerm
     are intersected, i.e. all terms must be satisfied.

Note: Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching topologyKey. If some or all nodes are missing the specified topologyKey label, it can lead to unintended behavior.