Search code examples
architecturecronscalabilityscheduled-tasksamqp

Scalable dynamic job queue processing


I'm currently working on a project where I need to process a large number of recurring jobs. Basically, when a job is finished I want to start it again 15 minutes later.

The set of jobs changes dynamically over time and thus I will need to monitor for new and removed jobs. Each job can take some time to process and thus I need to be able to scale. I'll have a website as the front end to manage these jobs.

I'm considering using MongoDB (with sharding) to store the jobs. Then I could create a "job broker" to query the database frequently to see if any jobs are ready and use e.g. RabbitMQ to start work on a set of workers.

There are a few very apparent issues with that setup though:

  • The "job broker" is a bottleneck and single-point-of-failure
  • Querying MongoDB on a very frequent basis on a potentially huge collection seems like a bad solution.

I'm not constrained by the technology, but I simply do not know how I should lay out the architecture for this. Any ideas?


Solution

  • Use AMQP. For each type of worker, have a queue that feeds jobs to that worker via a message. But add another worker type, the delayer.

    Each worker will receive a message, do the work, ack its message, and send a message to the delayer.

    The delayer is a bit different, because it gets a message, delays 15 minutes, then send the message back to the source worker and then acks the message. Because delaying is inherently blocking, you should have lots of delayer processes so that messages are not delayed on the queue, but only when they are in the hands of a delayer.