Search code examples
amazon-web-serviceskubernetescronkubernetes-cronjob

Kubernetes, scaling cronjob pod to a different node


We have a single kubernetes cronjob, which task is to detect a newly uploaded file, and perform some operations on it. This operation runs every minute, and may take 10 minutes to complete.

At the moment it works, and it creates new pods for jobs as new files are detected. However, we would like a pod, created by the cronjob, to be spawned onto a different Node. At this stage, all of my pods are spawned in the same node, which may cause my EC2 instance to crash in a worst-case scenario where there are a lot of new files and my system runs our of memory.

I am using an EFS filesystem to share files amongst my different Nodes, so all nodes can read the uploaded files.

How do I go about letting new pods get spawned on different nodes using kubernetes cronjobs?


Solution

  • You can use Inter pod antiAffinity in the pod template section of the cronjob.Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”

    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: test
    spec:
      schedule: "*/5 * * * *"
      jobTemplate:
        spec:
          template:
            spec:
              affinity:
                podAntiAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - web-store
                    topologyKey: "kubernetes.io/hostname"
              containers:
                - name: hello
                  image: bash
                  command: ["echo",  "Hello world"]
              restartPolicy: OnFailure
    

    Necessary API docs

    kubectl explain cronjob.spec.jobTemplate.spec.template.spec.affinity.podAntiAffinity
    KIND:     CronJob
    VERSION:  batch/v1beta1
    
    RESOURCE: podAntiAffinity <Object>
    
    DESCRIPTION:
         Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
         in the same node, zone, etc. as some other pod(s)).
    
         Pod anti affinity is a group of inter pod anti affinity scheduling rules.
    
    FIELDS:
       preferredDuringSchedulingIgnoredDuringExecution  <[]Object>
         The scheduler will prefer to schedule pods to nodes that satisfy the
         anti-affinity expressions specified by this field, but it may choose a node
         that violates one or more of the expressions. The node that is most
         preferred is the one with the greatest sum of weights, i.e. for each node
         that meets all of the scheduling requirements (resource request,
         requiredDuringScheduling anti-affinity expressions, etc.), compute a sum by
         iterating through the elements of this field and adding "weight" to the sum
         if the node has pods which matches the corresponding podAffinityTerm; the
         node(s) with the highest sum are the most preferred.
    
       requiredDuringSchedulingIgnoredDuringExecution   <[]Object>
         If the anti-affinity requirements specified by this field are not met at
         scheduling time, the pod will not be scheduled onto the node. If the
         anti-affinity requirements specified by this field cease to be met at some
         point during pod execution (e.g. due to a pod label update), the system may
         or may not try to eventually evict the pod from its node. When there are
         multiple elements, the lists of nodes corresponding to each podAffinityTerm
         are intersected, i.e. all terms must be satisfied.
    

    Note: Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching topologyKey. If some or all nodes are missing the specified topologyKey label, it can lead to unintended behavior.