Search code examples
kubernetescronjob-schedulingkubernetes-cronjob

Kubernetes cronjob run multiple processes at the same time without creating multiple jobs


I have a Python process that I want to fire up every n minutes in a Kubernetes cronjob and read a number of messages (say 5) from a queue, and then process/convert some files and run analysis on results based on these queue messages. If the process is still running after n minutes, I don't want to start a new process. In total, I would like a number of these (say 3) of these to be able to run at the same time, however, there can never be more than 3 processes running at the same time. To try and implement this, I tried the following (simplified):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: some-job
  namespace: some-namespace
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      parallelism: 3
      template:
        spec:
          containers:
          - name: job
            image: myimage:tag
            imagePullPolicy: Always
            command: ['python', 'src/run_job.py']

Now what this amounts to is a maximum of three processes running at the same time due to 'parallelism' being 3, and concurrencyPolicy being "Forbid", even if the processes go over the 5 minute mark.

The problem I specifically have is that one pod (e.g. pod 1) can take longer than the other two to finish, which means that pod 2 and 3 might finish after a minute, while pod one only finishes after 10 minutes due to processing of larger files from the queue.

Where I thought that parallelism: 3 would cause pod 2 and 3 to be deleted and replaced after finishing (when new cron interval hits), they are not and have to wait for pod 1 to finish before starting three new pods when the cron interval hits again.

When I think about it, this functionality makes sense given the specification and meaning of what a cronjob is. However, I would like to know if it would be able to have these pods/processes not be dependent on one another for restart without having to define a duplicate cronjob, all running one process.

Otherwise, maybe I would like to know if it's possible to easily launch more duplicate cronjobs without copying them into multiple manifests.


Solution

  • Duplicate cronjobs seems to be the way to achieve what you are looking for. Produce 3 duplicates with single job at a time. You could template the job manifest and produce multiple as in the following example. The example is not in your problem context, but you can get the idea. http://kubernetes.io/docs/tasks/job/parallel-processing-expansion