(It is possible to directly jump to the question, further down, and to skip the introduction.)
There is a common difficulty with pickling Python objects from user-defined classes:
# This is program dumper.py
import pickle
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(C(), f)
In fact, trying to get the object back from another program loader.py
with
# This is program loader.py
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
results in
AttributeError: 'module' object has no attribute 'C'
In fact, the class is pickled by name ("C"), and the loader.py
program does not know anything about C
. A common solution consists in importing with
from dumper import C # Objects of class C can be imported
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
However, this solution has a few drawbacks, including the fact that all the classes referenced by the pickled objects have to be imported (there can be many); furthermore, the local namespace becomes polluted by names from the dumper.py
program.
Now, a solution to this consists of fully qualifying objects prior to pickling:
# New dumper.py program:
import pickle
import dumper # This is this very program!
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified class
Unpickling with the original loader.py
program above now works directly (no need to do from dumper import C
).
Question: Now, other classes from dumper.py
seem to be automatically fully qualified upon pickling, and I would love to know how this works, and whether this is a reliable, documented behavior:
import pickle
import dumper # This is this very program!
class D(object): # New class!
pass
class C(object):
def __init__(self):
self.d = D() # *NOT* fully qualified
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified pickle class
Now, unpickling with the original loader.py
program also works (no need to do from dumper import C
); print obj.d
gives a fully qualified class, which I find surprising:
<dumper.D object at 0x122e130>
This behavior is very convenient, since only the top, pickled object has to be fully qualified with the module name (dumper.C()
). But is this behavior reliable and documented? how come that classes are pickled by name ("D") but that the unpickling decides that the pickled self.d
attribute is of class dumper.D
(and not some local D
class)?
PS: The question, refined: I just noticed a few interesting details that might point to an answer to this question:
In the pickling program dumper.py
, print self.d
prints <__main__.D object at 0x2af450>
, with the first dumper.py
program (the one without import dumper
). On the other hand, doing import dumper
and creating the object with dumper.C()
in dumper.py
makes print self.d
print <dumper.D object at 0x2af450>
: the self.d
attribute is automatically qualified by Python! So, it appears that the pickle
module has no role in the nice unpickling behavior described above.
The question is thus really: why does Python convert D()
into the fully qualified dumper.D
, in the second case? is this documented somewhere?
Here is what happens: when importing dumper
(or doing from dumper import C
) from within dumper.py
, the whole program is parsed again (this can be seen by inserting a print in the module). This behavior is expected, because dumper
is not a module that was already loaded (__main__
is considered loaded, however)–it is not in sys.modules
.
As illustrated in Mark's answer, importing a module naturally qualifies all the names defined in the module, so that self.d = D()
is interpreted as being of class dumper.D
when re-evaluating file dumper.py
(this is equivalent to parsing common.py
, in Mark's answer).
Thus, the import dumper
(or from dumper import C
) trick is explained, and pickling fully qualifies not only class C
but also class D
. This makes unpickling by an external program easier!
This also shows that import dumper
done in dumper.py
forces the Python interpreter to parse the program twice, which is neither efficient nor elegant. Pickling classes in a program and unpickling them in another one is therefore probably best done through the approach outlined in Mark's answer: pickled classes should be in a separate module.