Search code examples
pythonrabbitmqtaskcelerylong-running-processes

Persistent Long Running Tasks in Celery


I'm working on a Python based system, to enqueue long running tasks to workers.

The tasks originate from an outside service that generate a "token", but once they're created based on that token, they should run continuously, and stopped only when explicitly removed by code.
The task starts a WebSocket and loops on it. If the socket is closed, it reopens it. Basically, the task shouldn't reach conclusion.

My goals in architecting this solutions are:

  1. When gracefully restarting a worker (for example to load new code), the task should be re-added to the queue, and picked up by some worker.
  2. Same thing should happen when ungraceful shutdown happens.
  3. 2 workers shouldn't work on the same token.
  4. Other processes may create more tasks that should be directed to the same worker that's handling a specific token. This will be resolved by sending those tasks to a queue named after the token, which the worker should start listening to after starting the token's task. I am listing this requirement as an explanation to why a task engine is even required here.
  5. Independent servers, fast code reload, etc. - Minimal downtime per task.

All our server side is Python, and looks like Celery is the best platform for it. Are we using the right technology here? Any other architectural choices we should consider?

Thanks for your help!


Solution

  • According to the docs

    When shutdown is initiated the worker will finish all currently executing tasks before it actually terminates, so if these tasks are important you should wait for it to finish before doing anything drastic (like sending the KILL signal).

    If the worker won’t shutdown after considerate time, for example because of tasks stuck in an infinite-loop, you can use the KILL signal to force terminate the worker, but be aware that currently executing tasks will be lost (unless the tasks have the acks_late option set).

    You may get something like what you want by using retry or acks_late

    Overall I reckon you'll need to implement some extra application-side job control, plus, maybe, a lock service.

    But, yes, overall you can do this with celery. Whether there are better technologies... that's out of the scope of this site.