Pathos multiprocessing can't call any package and function in the class

I want to do multiprocessing in the class. It seems like only pathos.multiprocessing is able to help me. However, when I implement it, it can't load the packages I use in the main function.

from pathos.multiprocessing import ProcessingPool;
import time
import sys;
import datetime


class tester:
    def __init__(self):
        self.pool=ProcessingPool(2);

    def func(self,msg):
        print (str(datetime.datetime.now()));
        for i in xrange(1):
            print msg
            sys.stdout.flush();
        time.sleep(2)    

#----------------------------------------------------------------------
    def worker(self):
        """"""
        pool=self.pool
        for i in xrange(10):
               msg = "hello %d" %(i)
               pool.map(self.func,[i])
        pool.close()
        pool.join()
        time.sleep(40)



if __name__ == "__main__":
    print datetime.datetime.now();
    t=tester()
    t.worker()
    time.sleep(60);
    print "Sub-process(es) done."

the wrong is that global name 'datetime' is not defined. But it works in the main function! My sys is Win7.

Solution

I'm the author of pathos. If you execute your code on non-windows systems, it works fine -- even from the interpreter. (It also works from a file, as is too).

>>> from pathos.multiprocessing import ProcessingPool;
>>> import time
>>> import sys;
>>> import datetime
>>> class tester:
...     def __init__(self):
...         self.pool=ProcessingPool(2);
...     def func(self,msg):
...         print (str(datetime.datetime.now()));
...         for i in xrange(1):
...             print msg
...             sys.stdout.flush();
...         time.sleep(2)    
...     def worker(self):
...         """"""
...         pool=self.pool
...         for i in xrange(10):
...                msg = "hello %d" %(i)
...                pool.map(self.func,[i])
...         pool.close()
...         pool.join()
...         time.sleep(40)
... 
>>> datetime.datetime.now()
datetime.datetime(2015, 10, 21, 19, 24, 16, 131225)
>>> t = tester()
>>> t.worker()
2015-10-21 19:24:25.927781
0
2015-10-21 19:24:27.933611
1
2015-10-21 19:24:29.938630
2
2015-10-21 19:24:31.942376
3
2015-10-21 19:24:33.946052
4
2015-10-21 19:24:35.949965
5
2015-10-21 19:24:37.953877
6
2015-10-21 19:24:39.957770
7
2015-10-21 19:24:41.961704
8
2015-10-21 19:24:43.965193
9
>>>

The issue is that multiprocessing fundamentally is different on windows, in that windows doesn't have a true fork… and thus isn't as flexible as on systems with a fork. multiprocessing has a forking pickler, that under the covers spawns a subprocess… while non-windows systems can utilize shared memory across the processes.

dill has a check and a copy method that does a sequential loads(dumps(object)) on some object, where copy uses shared memory, while check uses subprocess (as is done on windows in multiprocessing). Here's the check method on a mac, so apparently that's not the issue.

>>> import dill
>>> dill.check(t.func)
<bound method tester.func of <__main__.tester instance at 0x1051c7998>>

The other thing you need to do on windows, is to use freeze_support at the beginning of __main__ (i.e. the first line of __main__). It's unnecessary on non-windows systems, but pretty much necessary on windows. Here's the doc.

>>> import pathos
>>> print pathos.multiprocessing.freeze_support.__doc__

    Check whether this is a fake forked process in a frozen executable.
    If so then run code specified by commandline and exit.

>>>