Search code examples
multithreadingmessage-queuetask-queuebeanstalkd

Beanstalk setup with multiple queue worker: Jobs that spawns another jobs


Is it safe to spawn multiple jobs from a job so workers could start working on any vacant jobs?

Currently my set up is like this. I have 20 workers waiting for any jobs to be pushed. One of the job is to send iOS push notification, the problem with iOS, You can't send bulk messages.

Current: What I made was, a job that gets the list of specific users by batch, get each device token from my db and start sending notification.

Scenario: If one topic has 1000 users, I have to get all 1000 users and their devices and then start sending on each device. This would push a new job on my queue and 1 worker would pick it app, while the other workers are vacant and waits for incoming jobs. What if no jobs would be available for a given time, Worker 1 had to do all the job sending then,

What I working right now. Is it safe if That one big job, would instead create another jobs so other workers who are vacant can pick it up and do the work?

P.S All jobs are running in 1 tube.


Solution

  • That sounds quite reasonable to me, spreading the load out among a number of workers.

    There are some things I would be careful about - such as setting an appropriate priority. If the task that created dozens, or hundreds more tasks has a higher priority then the job that actually does the sending, then you will quickly get potentially hundreds of thousands of jobs, but the workers may not be running them, and so the queue would be filling up.

    Leaving large gaps between the priorities mean you can slot in jobs that are really important as well. A more important customer may have a priority closer to zero, and hence be processed, and sent ahead of a smaller customer.

    Other matters to think about include an account being rate-limited - If you were limited to say 10 notifications per second, running 20 workers would be a non-starter.

    I would also put the new groups of jobs into a new tube (running dozens of tubes is not expensive). You can watch a number of tubes at once (getting the 'most important' job from any of them), but you can't count different types of job within a single tube, so splitting the types into different queues allows you to easily see how many jobs of each type are running. Thus, if the sending processes are building up, you could slow the splitting jobs from being created for a while, or mark them as an even lower priority for a while.

    Finally, to keep some advantage of batching jobs and avoiding some overhead, I'd likely split the jobs from 1000+-off into packets of maybe 25-50 notifications per job.