from math import sqrt
from joblib import Parallel, delayed
import time
if __name__ == '__main__':
st= time.time()
#[sqrt(i ** 2) for i in range(100000)] #this part in non parellel
Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(100000))
print time.time()-st
now the non parelle part runs in 0.4 sec while parallel part runs for 18 sec .. I am confused why would this happen
Parallel processes (which joblib
creates) require copying data. Imagine it this way: you have two people who each carry a rock to their house, shine it, then bring it back. That's loads slower than one person shining them on-the-spot.
All the time is wasted in transit, rather than being spent on the actual calculation. You will only benefit from parallel processes for more substantial computational tasks.
If you care about speeding up this specific operation:
Use numpy
's vectorized math operations. On my machine, parallel: 1.13 s, serial: 54.6 ms, numpy: 3.74 ms.
a = np.arange(100000, dtype=np.int)
np.sqrt(a ** 2)
Don't worry about libraries like Cython or Numba; they won't speed up this already performant operation.