Here's my situation. The code is almost the same as for the example in the docs:
from multiprocessing import Pool
import numpy as np
def grad(x0, y): return 0 # does some computational-heavy work actually
if __name__ == '__main__':
class UnrollArgs:
def __init__(self, func):
self.func = func
def __call__(self, args):
return self.func(*args)
def batch_grad(x0, y, processes=4):
g = Pool(processes).map(UnrollArgs(grad), [(x0, yi) for yi in y])
return np.sum([gi for gi in g], axis=0) / len(y)
The y
I pass to batch_grad
has 50 elements and Pool.map
throws an error:
error: can't start new thread
From Google I know that this is usually caused by the fact that one is trying to start too many threads. Maybe it's just me, but I think the documentation on multiprocessing.Pool
is a little incomplete. In particular, I don't get how to control the number of threads that should be started. The term "thread" isn't even mentioned in the documentation of the Pool
class.
The integral argument to multiprocessing.Pool
is the number of processes to start, not threads.
So how can I fix that?
Update: It might be worth noting that the error isn't raised every time I run the code.
I think the problem stems from spawning many Pool
s. The error is strange, and I think @ChongMa is correct that it's related to the Python interpreter itself not being able to spawn a thread. It sounds like my suggestion in the comments may be working for you, so I'm reposting it here as an answer.
Try these fixes:
a) use the Pool.close()
method to let each Pool
know it's not going to get any more work:
def batch_grad(x0, y, processes=4):
pool = Pool(processes)
g = pool.map(UnrollArgs(grad), [(x0, yi) for yi in y])
pool.close()
return np.sum([gi for gi in g], axis=0) / len(y)
b) re-use a Pool
for all your processing - passing the Pool
object into your batch_grad
function, instead of a number of processes:
def batch_grad(x0, y, pool=None):
if pool is None:
pool = Pool(4)
g = pool.map(UnrollArgs(grad), [(x0, yi) for yi in y])
return np.sum([gi for gi in g], axis=0) / len(y)
# then call your function like so
p = Pool(4)
batch_grad(your_x0, your_y, p)
Hopefully this works out for you long term.