Search code examples
djangokubernetescelery

Handling K8s forcefull kill of celery pods before completing running tasks


I am deploying a Django app with Celery workers on AWS EKS. I have everything running as expected, except that K8s proceeds to stopping Celery workers replicas before finishing ongoing tasks, I also have the same behavior when making a new deployment or pushing new code to the master branch.

What I have tried:

  • Setting a very large grace period, this solution didn't work, because we've got tasks that runs for hours.
  • Setting a preStop hook, this solution didn't also work since K8s doesn't wait for the hook to finish if it exceeds the grace period.
  • I have also tried fixed replicas count, but it's obviously not a solution.

more information: I have celery setup with Redis as a messaging broker and a result backend. After some research I started considering using Keda, but upon reading the docs, seems like it will only allow me to scale Celery pods based on queues length but doesn't give the kill mechanism I am looking for.

Is there any workaround to solve this issue?


Solution

  • I ended up setting a very big grace period; 5 hours