Search code examples
python-2.7multiprocessingpathos

Pathos multiprocessing can't call any package and function in the class


I want to do multiprocessing in the class. It seems like only pathos.multiprocessing is able to help me. However, when I implement it, it can't load the packages I use in the main function.

from pathos.multiprocessing import ProcessingPool;
import time
import sys;
import datetime


class tester:
    def __init__(self):
        self.pool=ProcessingPool(2);

    def func(self,msg):
        print (str(datetime.datetime.now()));
        for i in xrange(1):
            print msg
            sys.stdout.flush();
        time.sleep(2)    

#----------------------------------------------------------------------
    def worker(self):
        """"""
        pool=self.pool
        for i in xrange(10):
               msg = "hello %d" %(i)
               pool.map(self.func,[i])
        pool.close()
        pool.join()
        time.sleep(40)



if __name__ == "__main__":
    print datetime.datetime.now();
    t=tester()
    t.worker()
    time.sleep(60);
    print "Sub-process(es) done."

the wrong is that global name 'datetime' is not defined. But it works in the main function! My sys is Win7.


Solution

  • I'm the author of pathos. If you execute your code on non-windows systems, it works fine -- even from the interpreter. (It also works from a file, as is too).

    >>> from pathos.multiprocessing import ProcessingPool;
    >>> import time
    >>> import sys;
    >>> import datetime
    >>> class tester:
    ...     def __init__(self):
    ...         self.pool=ProcessingPool(2);
    ...     def func(self,msg):
    ...         print (str(datetime.datetime.now()));
    ...         for i in xrange(1):
    ...             print msg
    ...             sys.stdout.flush();
    ...         time.sleep(2)    
    ...     def worker(self):
    ...         """"""
    ...         pool=self.pool
    ...         for i in xrange(10):
    ...                msg = "hello %d" %(i)
    ...                pool.map(self.func,[i])
    ...         pool.close()
    ...         pool.join()
    ...         time.sleep(40)
    ... 
    >>> datetime.datetime.now()
    datetime.datetime(2015, 10, 21, 19, 24, 16, 131225)
    >>> t = tester()
    >>> t.worker()
    2015-10-21 19:24:25.927781
    0
    2015-10-21 19:24:27.933611
    1
    2015-10-21 19:24:29.938630
    2
    2015-10-21 19:24:31.942376
    3
    2015-10-21 19:24:33.946052
    4
    2015-10-21 19:24:35.949965
    5
    2015-10-21 19:24:37.953877
    6
    2015-10-21 19:24:39.957770
    7
    2015-10-21 19:24:41.961704
    8
    2015-10-21 19:24:43.965193
    9
    >>>
    

    The issue is that multiprocessing fundamentally is different on windows, in that windows doesn't have a true fork… and thus isn't as flexible as on systems with a fork. multiprocessing has a forking pickler, that under the covers spawns a subprocess… while non-windows systems can utilize shared memory across the processes.

    dill has a check and a copy method that does a sequential loads(dumps(object)) on some object, where copy uses shared memory, while check uses subprocess (as is done on windows in multiprocessing). Here's the check method on a mac, so apparently that's not the issue.

    >>> import dill
    >>> dill.check(t.func)
    <bound method tester.func of <__main__.tester instance at 0x1051c7998>>
    

    The other thing you need to do on windows, is to use freeze_support at the beginning of __main__ (i.e. the first line of __main__). It's unnecessary on non-windows systems, but pretty much necessary on windows. Here's the doc.

    >>> import pathos
    >>> print pathos.multiprocessing.freeze_support.__doc__
    
        Check whether this is a fake forked process in a frozen executable.
        If so then run code specified by commandline and exit.
    
    >>>