Search code examples
ruby-on-railsamazon-ec2sidekiq

Is it possible to force concurrent jobs to run in separate Sidekiq processes?


One of the benefits of Sidekiq over Resqueue is that it can run multiple jobs in the same process. The drawback, however, is I can't figure out how to force a set of concurrent jobs to run in different processes.

Here's my use case: say I have to generate 64M rows of data, and I have 8 vCPUs on an amazon EC2 instance. I'd like to carve the task up into 8 concurrent jobs generating 8M rows each. The problem is that if I'm running 8 sidekiq processes, sometimes sidekiq will decide to run 2 or more of the jobs in the same process, and so it doesn't use all 8 vCPUs and takes much longer to finish. Is there any way to tell sidekiq which worker to use or to force it to spread jobs in a group evenly amongst processes?


Solution

  • Answer is you can't easily, by design. Specialization is what leads to SPOFs.

    1. You can create a custom queue for each process and then create one job for each queue.
    2. You can use JRuby which doesn't suffer the same flaw.
    3. You can execute the processing as a rake task which will spawn one process per job, ensuring an even load.
    4. You can carve up 64 jobs instead of 8 and get a more even load that way.

    I would probably do the latter unless the resulting I/O crushes the machine.