python parallel-processing python-multiprocessing

Sleeping jobs occupying cores when using python multiprocessing pool

I am using multiprocessing.Pool in python to schedule around 2500 jobs. I am submitting the jobs like this:

pool = multiprocessing.Pool()
for i from 1 to 2500: # pseudocode
    jobs.append(pool.apply_async(....))
for j in jobs:
    _ = job.get()

The jobs are such, that after some computation, they go to sleep for a long time, waiting for some event to complete. My expectation was that, while they sleep, the other waiting jobs would get scheduled. But it is not happening like that. The maximum number of jobs scheduled at a single time is around 23 (even though they are all sleeping, ps aux shows state S+) which is the more or less the number of cores in the machine. Only after a job finishes and releases a core, another job is getting scheduled.

My expectation was that all 2500 jobs would get scheduled at once. How do I make python submit all 2500 jobs at once?

Solution

The multiprocessing and threading package of Python use process/thread pools. By default the number of processes/threads in a pool is dependent of the hardware concurrency (ie. typically the number of hardware threads supported by your processor). You can tune this number but you should really not create too many threads or processes because they are precious resources of the operating system (OS). Note that threads are less expensive than processes for most OS but the CPython makes threads not very useful (except of IO latency-bound jobs) because of the global interpreter lock (GIL). Creating 2500 processes/threads will put a lot of pressure on the OS scheduler and slow does the whole system. OS are design so that waiting threads are not expensive but frequent wake ups will be clearly expensive. Moreover, the number of processes/threads that can be created on a given platform is bounded. AFAIR, on my old Windows 7 system this was limited to 1024. The biggest problem is that each thread requires a stack typically initialized to 1~2 MiB so that creating 2500 threads will takes 2.5~5.0 GiB of RAM! This will be significantly worst for processes. Not to mention cache misses will be more frequent resulting in a slower execution. Thus, put it shortly, do not create 2500 threads or processes: this is too expensive.

You do not need threads or processes, you needs fibers or more generally green threads likes greenlet or eventlet as well as gevent coroutines. The last is known to be fast and supports thread-pools. Alternatively, you can use the recent async feature of Python which is the standard way to deal with such a problem.