I'm trying to use parallel python in order to do some distributed benchmarking (essentially, coordinate and run some code on a set of machines from a central server). The code I had was working perfectly fine until I moved the functionality to a separate package. From then on, I keep getting ImportError: No module named some.module.pp_test
.
My question is actually two-fold: has anyone ever came across this problem with pp
, and if yes, how to solve it? I tried using dill
(import dill
), but didn't help. Also, is there a good replacement for parallelpython, that doesn't require any additional infrastructure?
The exact error I get is:
RUNNING TEST
Waiting for hosts to finish booting....A fatal error has occured during the function execution
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/ppworker.py", line 86, in run
__args = pickle.loads(__sargs)
ImportError: No module named some.module.pp_test
Caught exception in the run phase 'NoneType' object is not iterable
Traceback (most recent call last):
File "test.py", line 5, in <module>
p.ping_pong()
File "/home/ubuntu/workspace/pp-test/some/module/pp_test.py", line 5, in ping_pong
a_test.run()
File "/home/ubuntu/workspace/pp-test/some/module/pp_test.py", line 27, in run
pong, hostname = ping()
TypeError: 'NoneType' object is not iterable
The code is structured this way:
pp-test/
test.py
some/
__init__.py
module/
__init__.py
pp_test.py
The test.py
is implemented as:
from some.module.pp_test import MWE
p = MWE()
p.ping_pong()
While pp_test.py
is:
class MWE():
def ping_pong(self):
print "RUNNING TEST "
a_test = PPTester()
a_test.run()
import pp
import time
from sys import stdout, exit
class PPTester(object):
def run(self):
try:
ppservers = ('10.10.10.10', )
time.sleep(5)
job_server = pp.Server(0, ppservers=ppservers)
stdout.write("Waiting for hosts to finish booting...")
while len(job_server.get_active_nodes()) - 1 < len(ppservers):
stdout.write(".")
stdout.flush()
time.sleep(1)
ppmodules = ()
pings = [(server, job_server.submit(self.run_pong, modules=ppmodules)) for server in ppservers]
for server, ping in pings:
pong, hostname = ping()
print "Host ", hostname, " is alive!"
print "All servers booted up, starting benchmarks..."
job_server.print_stats()
except Exception as e:
print "Caught exception in the run phase", e
raise
pass
def run_pong(self):
import subprocess
p = subprocess.Popen("hostname", stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
(output, err) = p.communicate()
p_status = p.wait()
return "pong ", output
dill
won't work with pp
out of the box, because pp
doesn't serialize the python objects -- pp
extracts the object's source code (like the inspect
module in the standard python library).
To enable pp
to use dill
(actually dill.source
, which is inspect
augmented by dill
), you have to use a fork of pp
called ppft
. ppft
installs as pp
(i.e. imports with import pp
), but it has much stronger source inspection, so you can automatically "serialize" most python objects and have ppft
track down their dependencies automatically.
Get ppft
here: https://github.com/uqfoundation
ppft
is also pip
installable and python 3.x
compatible.