Search code examples
pythonmultithreadingfunctionmultiprocessingprogram-entry-point

Python, issue with multiprocessing library


I have a program in which I would like to run one of the functions in parallel for several arguments.

the program is in the following format:

import statements 

def function1():
     do something

def function2()
     do something

def main():

     function1()

I found several examples of how to use the multiprocessing library online such as the following general template

import multiprocessing

def worker(num):
    print 'Worker:', num
    return

if __name__ == '__main__':
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        p.start()

From what I understand the worker() is the function which is meant to be performed in parallel. But I am not sure where or how to use the (if __name__ == '__main__':) block the code.

as of now that block is in my main() and when I run the program I do not get the worker function executed multiple times, instead my main gets executed multiple times?

So where is the proper place to put the (if __name__ == '__main__':) block


Solution

  • Blending together the two examples you provide, it would look like this:

    import multiprocessing
    
    def worker(num):
        print 'Worker:', num
        return
    
    def main():
    
        for i in range(5):
            p = multiprocessing.Process(target=worker, args=(i,))
            p.start()
            p.join()
    
    if __name__ == '__main__':
         main()
    

    Replace worker with function1, i.e. whichever you'd like to parallelise.

    The key part is calling that main function in the if __name__ == '__main__': block, however in this simple example you could as easily put the code under def main(): under if __name__ == '__main__': directly.

    If you're never going to import anything from this file, you don't even need the if __name__ == '__main__': part; this is only required if you want to be able to import functions from this script into other scripts/an interactive session without running the code in main(). See What does if __name__ == "__main__": do?.

    So the simplest usage would be:

    import multiprocessing
    
    def worker(num):
        print 'Worker:', num
        return
    
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        p.start()
        p.join()
    

    Edit: multiprocessing pool example

    import multiprocessing
    
    def worker(num):
        #print 'Worker:', num
        return num
    
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    
    result = pool.imap(worker, range(5))
    
    print list(result)
    

    Prints:

    [0, 1, 2, 3, 4]
    

    See also Python multiprocessing.Pool: when to use apply, apply_async or map? for more detailed explanations.