Search code examples
kubernetesjobs

Restarting Kubernetes job


I'm working with Kubernetes 1.26 on the server side (EKS) and kubectl client 1.27.1.

I have a job define in this way:

apiVersion: batch/v1
kind: Job
metadata:
  name: build
spec:
  template:
    spec:
      restartPolicy: Never
      volumes:
          .....
      containers:
          - name: build-tool
          ....

My pod dies for OOMKilled or some other reason, then Kubernetes launches another pod. Why?

It is supposed not to be restarted.

Related reading:


Solution

  • I think you missed this section of the documentation:

    An entire Pod can also fail, for a number of reasons, such as when the pod is kicked off the node (node is upgraded, rebooted, deleted, etc.), or if a container of the Pod fails and the .spec.template.spec.restartPolicy = "Never". When a Pod fails, then the Job controller starts a new Pod. This means that your application needs to handle the case when it is restarted in a new pod. In particular, it needs to handle temporary files, locks, incomplete output and the like caused by previous runs.

    The value of spec.template.spec.restartPolicy effects the response to failed containers in your Pod (docs), but is not relevant to failures of the pod itself.

    You can control how the Job controller responds to a failed Pod by setting a podFailurePolicy.