We have a cronjob monitoring in our cluster. If a pod did not appear in 24 hours, It means that the cronjob haven't ran and we need to alert. But sometimes, due to some garbage collection, pod is deleted (but job completed successfully). How to keep all pods and avoid garbage collection? I know about finalizers, but looks like It's not working in this case.
Posting this as answer since it's a reason why it can happen.
Cloud kubernetes clusters have nodes autoscaling
policies. Or sometimes node pools
can be scaled down/up manually.
Cronjob
creates job
for each run which in turn creates a corresponding pod
. Pods are assigned to exact nodes. And if for any reason node with assigned to it pod(s) was removed due to node autoscaling/manual scaling
, pods will be gone. However jobs will be preserved since they are stored in etcd
.
There are two flags which control amount of jobs stored in the history:
.spec.successfulJobsHistoryLimit
- which is by default set to 3
.spec.failedJobsHistoryLimit
- set by default to 1
If setting up 0
then everything will be removed right after jobs finish.
I have a GCP GKE cluster with two nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-xxxx Ready <none> 15h v1.21.3-gke.2001
gke-cluster-yyyy Ready <none> 3d20h v1.21.3-gke.2001
cronjob.yaml
for testing:
apiVersion: batch/v1
kind: CronJob
metadata:
name: test-cronjob
spec:
schedule: "*/2 * * * *"
successfulJobsHistoryLimit: 5
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Pods created:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-cronjob-27253914-mxnzg 0/1 Completed 0 8m59s 10.24.0.22 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253916-88cjn 0/1 Completed 0 6m59s 10.24.0.25 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253918-hdcg9 0/1 Completed 0 4m59s 10.24.0.29 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253920-shnnp 0/1 Completed 0 2m59s 10.24.1.15 gke-cluster-4-yyyy <none> <none>
test-cronjob-27253922-cw5gp 0/1 Completed 0 59s 10.24.1.18 gke-cluster-4-yyyy <none> <none>
Scaling down one node:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-4-xxxx NotReady,SchedulingDisabled <none> 16h v1.21.3-gke.2001
gke-cluster-4-yyyy Ready <none> 3d21h v1.21.3-gke.2001
And getting pods now:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-cronjob-27253920-shnnp 0/1 Completed 0 7m47s 10.24.1.15 gke-cluster-4-yyyy <none> <none>
test-cronjob-27253922-cw5gp 0/1 Completed 0 5m47s 10.24.1.18 gke-cluster-4-yyyy <none> <none>
Previously completed pods on the first node are gone now.
Jobs are still in place:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
test-cronjob-27253914 1/1 1s 13m
test-cronjob-27253916 1/1 2s 11m
test-cronjob-27253918 1/1 1s 9m55s
test-cronjob-27253920 1/1 34s 7m55s
test-cronjob-27253922 1/1 2s 5m55s
Changing monitoring alert to look for jobs completion is much more precise method and independent to any cluster nodes scaling actions.
E.g. I still can retrieve a result from job test-cronjob-27253916
where corresponding pod
to it is deleted:
$ kubectl get job test-cronjob-27253916 -o jsonpath='{.status.succeeded'}
1