I'm using python3.3. It's possible this problem doesn't exist in 2.x's pickle protocol, but I haven't actually verified.
Suppose I've created a dict
subclass that counts every time a key is updated. Something like this:
class Foo(dict):
def __init__(self):
self.counter = 0
def __setitem__(self, key, value):
print(key, value, self.__dict__)
if key == 'bar':
self.counter += 1
super(Foo, self).__setitem__(key, value)
You might use it like this:
>>> f = Foo()
>>> assert f.counter == 0
>>> f['bar'] = 'baz'
... logging output...
>>> assert f.counter == 1
Now let's pickle and unpickle it:
>>> import pickle
>>> f_str = pickle.dumps(f)
>>> f_new = pickle.loads(f_str)
bar baz {}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 133, in __setitem__
self.counter += 1
AttributeError: 'Foo' object has no attribute 'counter'
I think the print()
in __setitem__
shows the problem: pickle.loads
attempts to write the dictionary's keys before it writes the object's attributes... at least I think that's what's happening. It's pretty easy to verify if you remove the self.counter
reference in Foo.__setitem__()
:
>>> f_mod = ModifiedFoo()
>>> f_mod['bar'] = 'baz'
>>> f_mod_str = pickle.dumps(f_mod)
>>> f_mod_new = pickle.loads(f_mod_str)
bar baz {}
>>> assert f_mod_new.counter == 0
>>>
Is this just a byproduct of the pickle protocol? I've tried variations on __setstate__
to let it unpickle correctly, but as far as I can tell, it hits the __setitem__
error before __setstate__
is even called. Is there any way I can modify this object to allow unpickling?
As stated by pickle
documentation:
When a pickled class instance is unpickled, its
__init__()
method is normally not invoked.
In your case you do want to invoke __init__
. However since your class is a new-style class you cannot use __getinitargs__
(which isn't supported in python3 anyway). You could try to write your custom __getstate__
and __setstate__
methods:
class Foo(dict):
def __init__(self):
self.counter = 0
def __getstate__(self):
return (self.counter, dict(self))
def __setstate__(self, state):
self.counter, data = state
self.update(data) # will *not* call __setitem__
def __setitem__(self, key, value):
self.counter += 1
super(Foo, self).__setitem__(key, value)
However this still doesn't work, because since you are subclassing dict
and dict
has a special handler for pickling, the __getstate__
method is called, however the __setstate__
method is not.
You can work around this defining the __reduce__
method:
class Foo(dict):
def __init__(self):
self.counter = 0
def __getstate__(self):
return (self.counter, dict(self))
def __setstate__(self, state):
self.counter, data = state
self.update(data)
def __reduce__(self):
return (Foo, (), self.__getstate__())
def __setitem__(self, key, value):
self.counter += 1
super(Foo, self).__setitem__(key, value)