Search code examples
sqlmessagingrabbitmq

How to know when a set of RabbitMQ tasks are complete?


I am using RabbitMQ to have worker processes encode video files. I would like to know when all of the files are complete - that is, when all of the worker processes have finished.

The only way I can think to do this is by using a database. When a video finishes encoding:

UPDATE videos SET status = 'complete' WHERE filename = 'foo.wmv'
-- etc etc etc as each worker finishes --

And then to check whether or not all of the videos have been encoded:

SELECT count(*) FROM videos WHERE status != 'complete'

But if I'm going to do this, then I feel like I am losing the benefit of RabbitMQ as a mechanism for multiple distributed worker processes, since I still have to manually maintain a database queue.

Is there a standard mechanism for RabbitMQ dependencies? That is, a way to say "wait for these 5 tasks to finish, and once they are done, then kick off a new task?"

I don't want to have a parent process add these tasks to a queue and then "wait" for each of them to return a "completed" status. Then I have to maintain a separate process for each group of videos, at which point I've lost the advantage of decoupled worker processes as compared to a single ThreadPool concept.

Am I asking for something which is impossible? Or, are there standard widely-adopted solutions to manage the overall state of tasks in a queue that I have missed?

Edit: after searching, I found this similar question: Getting result of a long running task with RabbitMQ

Are there any particular thoughts that people have about this?


Solution

  • Use a "response" queue. I don't know any specifics about RabbitMQ, so this is general:

    • Have your parent process send out requests and keep track of how many it sent
    • Make the parent process also wait on a specific response queue (that the children know about)
    • Whenever a child finishes something (or can't finish for some reason), send a message to the response queue
    • Whenever numSent == numResponded, you're done

    Something to keep in mind is a timeout -- What happens if a child process dies? You have to do slightly more work, but basically:

    • With every sent message, include some sort of ID, and add that ID and the current time to a hash table.
    • For every response, remove that ID from the hash table
    • Periodically walk the hash table and remove anything that has timed out

    This is called the Request Reply Pattern.