I am using Joblib to run parallel jobs in my Python application. During profiling, I noticed that the slowest process was {built-in method time.sleep}
. Interestingly, this issue disappears when I remove Joblib parallel processing. Could you explain why {built-in method time.sleep} becomes a bottleneck with Joblib parallel processing?
Here is a simplified version of my code:
from joblib import Parallel, delayed
def my_function(x):
# Some computation
return x * x
results = Parallel(n_jobs=2)(delayed(my_function)(i) for i in range(10))
Profiling Output
ncalls tottime percall cumtime percall filename:lineno(function)
17153 209.317 0.012 209.317 0.012 {built-in method time.sleep}
588 0.835 0.001 0.835 0.001 {method 'poll' of 'select.poll' objects}
Presumably because the parent process you're profiling is waiting for the child processes that are doing the work to become ready, and are doing that with time.sleep()
.
You can verify this with a profiler that shows you the call graph – if my guess is correct, one of the top callers to time.sleep()
is somewhere within joblib
. My guess is it's this code.