Search code examples
kuberneteskubernetes-jobs

Controll job deletions


We have a cronjob monitoring in our cluster. If a pod did not appear in 24 hours, It means that the cronjob haven't ran and we need to alert. But sometimes, due to some garbage collection, pod is deleted (but job completed successfully). How to keep all pods and avoid garbage collection? I know about finalizers, but looks like It's not working in this case.


Solution

  • Posting this as answer since it's a reason why it can happen.

    Answer

    Cloud kubernetes clusters have nodes autoscaling policies. Or sometimes node pools can be scaled down/up manually.

    Cronjob creates job for each run which in turn creates a corresponding pod. Pods are assigned to exact nodes. And if for any reason node with assigned to it pod(s) was removed due to node autoscaling/manual scaling, pods will be gone. However jobs will be preserved since they are stored in etcd.

    There are two flags which control amount of jobs stored in the history:

    • .spec.successfulJobsHistoryLimit - which is by default set to 3
    • .spec.failedJobsHistoryLimit - set by default to 1

    If setting up 0 then everything will be removed right after jobs finish.

    Jobs History Limits

    How it happens in fact

    I have a GCP GKE cluster with two nodes:

    $ kubectl get nodes
    NAME                   STATUS   ROLES    AGE     VERSION
    gke-cluster-xxxx       Ready    <none>   15h     v1.21.3-gke.2001
    gke-cluster-yyyy       Ready    <none>   3d20h   v1.21.3-gke.2001
    

    cronjob.yaml for testing:

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: test-cronjob
    spec:
      schedule: "*/2 * * * *"
      successfulJobsHistoryLimit: 5
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: test
                image: busybox
                imagePullPolicy: IfNotPresent
                command:
                - /bin/sh
                - -c
                - date; echo Hello from the Kubernetes cluster
              restartPolicy: OnFailure
    

    Pods created:

    $ kubectl get pods -o wide
    NAME                          READY   STATUS      RESTARTS   AGE     IP           NODE                 NOMINATED NODE   READINESS GATES
    test-cronjob-27253914-mxnzg   0/1     Completed   0          8m59s   10.24.0.22   gke-cluster-4-xxxx   <none>           <none>
    test-cronjob-27253916-88cjn   0/1     Completed   0          6m59s   10.24.0.25   gke-cluster-4-xxxx   <none>           <none>
    test-cronjob-27253918-hdcg9   0/1     Completed   0          4m59s   10.24.0.29   gke-cluster-4-xxxx   <none>           <none>
    test-cronjob-27253920-shnnp   0/1     Completed   0          2m59s   10.24.1.15   gke-cluster-4-yyyy   <none>           <none>
    test-cronjob-27253922-cw5gp   0/1     Completed   0          59s     10.24.1.18   gke-cluster-4-yyyy   <none>           <none>
    

    Scaling down one node:

    $ kubectl get nodes
    NAME                 STATUS                        ROLES    AGE   VERSION
    gke-cluster-4-xxxx   NotReady,SchedulingDisabled   <none>   16h   v1.21.3-gke.2001
    gke-cluster-4-yyyy   Ready                         <none>   3d21h   v1.21.3-gke.2001
    

    And getting pods now:

    $ kubectl get pods -o wide
    NAME                          READY   STATUS      RESTARTS   AGE     IP           NODE                 NOMINATED NODE   READINESS GATES
    test-cronjob-27253920-shnnp   0/1     Completed   0          7m47s   10.24.1.15   gke-cluster-4-yyyy   <none>           <none>
    test-cronjob-27253922-cw5gp   0/1     Completed   0          5m47s   10.24.1.18   gke-cluster-4-yyyy   <none>           <none>
    

    Previously completed pods on the first node are gone now.

    Jobs are still in place:

    $ kubectl get jobs
    NAME                    COMPLETIONS   DURATION   AGE
    test-cronjob-27253914   1/1           1s         13m
    test-cronjob-27253916   1/1           2s         11m
    test-cronjob-27253918   1/1           1s         9m55s
    test-cronjob-27253920   1/1           34s        7m55s
    test-cronjob-27253922   1/1           2s         5m55s
    

    How it can be solved

    Changing monitoring alert to look for jobs completion is much more precise method and independent to any cluster nodes scaling actions.

    E.g. I still can retrieve a result from job test-cronjob-27253916 where corresponding pod to it is deleted:

    $ kubectl get job test-cronjob-27253916 -o jsonpath='{.status.succeeded'}
    1
    

    Useful links: