Search code examples
pythonmessage-queuepython-rq

python rq worker execute tasks in parallel


I don't understand python rq that much and I just started learning about it.

There is a task_a that takes 3 minutes to finish processing.

@job
def task_a():
    time.sleep(180)
    print('done processing task_a')

def call_3_times():
    task_a.delay()
    task_a.delay()
    task_a.delay()

From what I observed, task_a will be executed one by one from the queue. After the first call is finished, then proceeds to the next call and so on. Total time taken is 3 minutes x 3 = 9 minutes

How can I make each task_a in call_3_times function be executed in parallel? so the time taken is lesser than 9 minutes probably 3 minutes and 10 sec (just an example it would probably be faster than that).

Probably I need to spawn 3 rq workers yes it does work faster and like parallel. But what if I need to call it 2000 times. Should I spawn 2000 rq workers? I mean, there must be a better way to do that.


Solution

  • If you need to call the task 2000 times, you can create 2000 jobs in the queue, and have only 3 workers to work in parallel 3 at a time until all jobs are done.

    The number of workers depends on the spec of your server. It's obviously not practical to initiate 2000 workers in an attempt to parallel all jobs at once. If you really need to process thousands of jobs at once, you have two choices:

    1. Distribute the jobs on a farm of workers (multiple servers)
    2. Add concurrency within each worker function, so that each worker spawns new threads or processes to do the actual work.

    Choice #2 depends on what type of work you're doing (I/O or CPU bound). If it's IO bound and thread-safe, use threads in worker function, otherwise, use multiprocessing with the trade-off in increased resource dependency. However, if you have the resource to spawn off multiple processes, why not just increase the worker count at the first place which has less complexity.

    So to summarize, base on your task type. If it's I/O bound, you can do #1/#2. If it's CPU bound, your choice is limited to #1 with respect to the spec of your server.