Search code examples
pythonwindows-xpmultiprocess

python multiprocess caller (as well as callee) invoked multiple times on windows XP


Possible Duplicate:
Multiprocessing launching too many instances of Python VM

I'm trying to use python multiprocess to parallelize web fetching, but I'm finding that the application calling the multiprocessing gets instantiated multiple times, not just the function I want called (which is a problem for me as the caller has some dependencies on a library that is slow to instantiate - losing most of my performance gains from parallelism).

What am I doing wrong or how is this avoided?

my_app.py:

from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff

my_slow_stuff.py:

if __name__ == '__main__':
    import datetime
    urls = ['http://www.microsoft.com'] * 20
    results = parallel_fetch(urls, fn=url_fetch)
    print([x[:20] for x in results])

class MySlowStuff(object):
    import time
    print('doing slow stuff')
    time.sleep(0)
    print('done slow stuff')

url_fetcher.py:

import multiprocessing
import urllib

def url_fetch(url):
    #return urllib.urlopen(url).read()
    return url

def parallel_fetch(urls, fn):
    PROCESSES = 10
    CHUNK_SIZE = 1
    pool = multiprocessing.Pool(PROCESSES)
    results = pool.imap(fn, urls, CHUNK_SIZE)
    return results

if __name__ == '__main__':
    import datetime
    urls = ['http://www.microsoft.com'] * 20
    results = parallel_fetch(urls, fn=url_fetch)
    print([x[:20] for x in results])

partial output:

$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff

...


Solution

  • Python multiprocessing module for Windows behaves slightly differently because Python doesn't implement os.fork() on this platform. In particular:

    Safe importing of main module

    Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

    Here, global class MySlowStuff gets always evaluated by newly started child processes on Windows. To fix that class MySlowStuff should be defined only when __name__ == '__main__'.

    See 16.6.3.2. Windows for more details.