Search code examples
pandasmacospython-multiprocessingpython-3.8

Multiprocessing with pandas data frames hangs at queue.get() and pool.map()


I am new to multiprocessing. I am using pandas and python3.8 in MacOS with 8 cores for data analysis. Without multiprocessing my current program takes 71 seconds for a 60000x60000 data frame, I wish to speed it up more and use it for a larger data frame.

I followed some online guides to write a simple function to print a number. But it hangs when the pool is executed.

import multiprocessing as mp
q = mp.Queue()

def func(x): 
  print(x[i])

def main():
  start = time.time()
  pool = mp.Pool(processes = (mp.cpu_count()-1))
  x=np.arange(0,100,1)
  odor_presence = pool.map(func ,x)
  pool.join()
  print('Execution time: ', time.time()-start)

Solution

  • So after digging around for a while I realized it is because of my anaconda environment it will not let me use any multiprocessing. The code works perfectly in a system without anaconda.

    I found this as well which describes the issue and problem. Multiprocessing program has AttributeError in Anaconda notebook