python multithreading function multiprocessing program-entry-point

Python, issue with multiprocessing library

I have a program in which I would like to run one of the functions in parallel for several arguments.

the program is in the following format:

import statements 

def function1():
     do something

def function2()
     do something

def main():

     function1()

I found several examples of how to use the multiprocessing library online such as the following general template

import multiprocessing

def worker(num):
    print 'Worker:', num
    return

if __name__ == '__main__':
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        p.start()

From what I understand the worker() is the function which is meant to be performed in parallel. But I am not sure where or how to use the (if __name__ == '__main__':) block the code.

as of now that block is in my main() and when I run the program I do not get the worker function executed multiple times, instead my main gets executed multiple times?

So where is the proper place to put the (if __name__ == '__main__':) block

Solution

Blending together the two examples you provide, it would look like this:

import multiprocessing

def worker(num):
    print 'Worker:', num
    return

def main():

    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        p.start()
        p.join()

if __name__ == '__main__':
     main()

Replace worker with function1, i.e. whichever you'd like to parallelise.

The key part is calling that main function in the if __name__ == '__main__': block, however in this simple example you could as easily put the code under def main(): under if __name__ == '__main__': directly.

If you're never going to import anything from this file, you don't even need the if __name__ == '__main__': part; this is only required if you want to be able to import functions from this script into other scripts/an interactive session without running the code in main(). See What does if __name__ == "__main__": do?.

So the simplest usage would be:

import multiprocessing

def worker(num):
    print 'Worker:', num
    return

for i in range(5):
    p = multiprocessing.Process(target=worker, args=(i,))
    p.start()
    p.join()

Edit: multiprocessing pool example

import multiprocessing

def worker(num):
    #print 'Worker:', num
    return num

pool = multiprocessing.Pool(multiprocessing.cpu_count())

result = pool.imap(worker, range(5))

print list(result)

Prints:

[0, 1, 2, 3, 4]

See also Python multiprocessing.Pool: when to use apply, apply_async or map? for more detailed explanations.