I'm evaluating dill and I want to know if this scenario is handled. I have a case where I successfully import a module in a python process. Can I use dill to serialize and then load that module in a different process that has a different sys.path which doesn't include that module? Right now I get import failures but maybe I'm doing something wrong.
Here's an example. I run this script where the foo.py module's path is in my sys.path:
% cat dill_dump.py
import dill
import foo
myFile = "./foo.pkl"
fh = open(myFile, 'wb')
dill.dump(foo, fh)
Now, I run this script where I do not have foo.py's directory in my PYTHONPATH:
% cat dill_load.py
import dill
myFile = "./foo.pkl"
fh = open(myFile, 'rb')
foo = dill.load(fh)
print foo
It fails with this stack trace:
Traceback (most recent call last):
File "dill_load.py", line 4, in <module>
foo = dill.load(fh)
File "/home/b/lib/python/dill-0.2.4-py2.6.egg/dill/dill.py", line 199, in load
obj = pik.load()
File "/rel/lang/python/2.6.4-8/lib/python2.6/pickle.py", line 858, in load
dispatch[key](self)
File "/rel/lang/python/2.6.4-8/lib/python2.6/pickle.py", line 1133, in load_reduce
value = func(*args)
File "/home/b/lib/python/dill-0.2.4-py2.6.egg/dill/dill.py", line 678, in _import_module
return __import__(import_name)
ImportError: No module named foo
So, if I need to have the same python path between the two processes, then what's the point of serializing a python module? Or in other words, is there any advantage to loading foo via dill over just having an "import foo" call?
That's an interesting failure. Notice that if you do dill.dumps(foo)
you will get the contents of the module foo
… the part that fails is using python's built-in import hook (__import__
) to do little more than to register the module into sys.modules
. It should be possible to work around that and modify dill
so that the module could be imported if the module is not found in the PYTHONPATH. However, I do think it's proper that the module have to be found in the PYTHONPATH… that is what is expected of a module… so I'm not sure if it's a good idea. But it might be...
As noted above, for a file foo.py
, with contents: hello = "hello world, I am foo"
>>> import dill
>>> import foo
>>> dill.dumps(foo)
'\x80\x02cdill.dill\n_import_module\nq\x00U\x03fooq\x01\x85q\x02Rq\x03}q\x04(U\x08__name__q\x05h\x01U\x08__file__q\x06U\x06foo.pyq\x07U\x05helloq\x08U\x15hello world, I am fooq\tU\x07__doc__q\nNU\x0b__package__q\x0bNub.'
You can see the contents of the file are preserved in the pickle.
The primarily reason to use dill
with modules, is that dill
can record dynamic modifications to modules. For example, adding a function or other object:
>>> import foo
>>> import dill
>>> foo.a = 100
>>> with open('foo.pkl', 'w') as f:
... dill.dump(foo, f)
...
>>>
Then restarting… (with foo
in the PYTHONPATH)
Python 2.7.10 (default, May 25 2015, 13:16:30)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('foo.pkl', 'r') as f:
... foo = dill.load(f)
...
>>> foo.hello
'hello world, I am foo'
>>> foo.a
100
>>>
I've added this as a bug report / feature request: https://github.com/uqfoundation/dill/issues/123