Search code examples
kuberneteskubectlkubernetes-jobs

Can a single kubernetes job contain multiple pods with different parallelism definitions?


I have a batch job which breaks down in 3 tasks that each depend on the previous finishing before they can start:

  1. Run a single pod
  2. Run N pods in parallel (.spec.completions = .spec.parallelism = N)
  3. Run M pods in parallel (.spec.completions = .spec.parallelism = M)

Each task has different resource requirements (CPU/MEM/STORAGE). Currently, I start job #1, when it finishes it runs a kubectl command to start job #2, and so on to job #3. I have 3 separate jobs.

Can I define a single job for these 3 tasks?

Maybe something like this:

  1. Run single pod for task #1
  2. Define init container on task #2 to wait for task #1 to finish
  3. Run N pods for task #2 using .spec.completions
  4. Define init container on task # to wait for task #2 to finish
  5. Run M pods for task #3 using a different .spec.completions appropriate for task #3

It's not clear to me if I can define separate .spec.parallelism and .spec.completions for different pods under the same job. And if I can define separate init containers to delay the start of the later tasks.

This may all require a more complete workflow engine like Argo (which we don't yet have available).


Solution

  • Kubernetes Job Controller creates a pod based on the single pod template in the Job spec. So No you can't have multiple pods in a Job.

    But kubernetes is an extensible system and you can define your own Custom Resource and write a controller like Job controller which supports multiple pod templates with different parallelism.