I have a parent task from which I'm executing subtasks via group:
@task()
def parent():
...
for x in big_long_loop:
subtasks = []
...
subtasks.append(subtask.s(foo, bar, baz))
...
g = group(*subtasks)
g.delay()
The subtasks have a dedicated queue and worker separate from the parent's. Also, the subtask queue's worker is on a remote server from the server executing the parent task.
Once in a while, in the midst of launching groups of subtasks, this parent task fails or times out. After that point, the (remote) worker for the subtask queue goes haywire. It fluctuates online/offline in flower, and eventually just stops executing tasks. They are received but are never started by the remote worker.
Is this expected behavior? If a parent does not ultimately succeed, how does it affect any subtasks that were created during that task, or in the future when the task is executed again? Does this have anything to do with the haywire worker being on a remote server? Note this server has other workers for other queues, and they do not flicker offline.
After much pain, we discovered the issue here: we were passing django model objects to the subtasks. This created some sort of slowdown as I'm guessing the db connection from the parent machine was being preserved and passed to the remote machine, causing DB and rabbitmq slowdowns as the objects were hitting the DB through a "via parent machine" db connection.
Things became very smooth once we changed from subtask.s(foo, bar, baz)
to subtask.s(foo.id, bar.id, baz.id)