Search code examples
pythonpython-3.xpicklepython-3.3

How can I unpickle a subclass of 'dict' that validates with __setitem__ in python3?


I'm using python3.3. It's possible this problem doesn't exist in 2.x's pickle protocol, but I haven't actually verified.

Suppose I've created a dict subclass that counts every time a key is updated. Something like this:

class Foo(dict):
    def __init__(self):
        self.counter = 0

    def __setitem__(self, key, value):
        print(key, value, self.__dict__)
        if key == 'bar':
            self.counter += 1
        super(Foo, self).__setitem__(key, value)

You might use it like this:

>>> f = Foo()
>>> assert f.counter == 0
>>> f['bar'] = 'baz'
... logging output...        
>>> assert f.counter == 1

Now let's pickle and unpickle it:

>>> import pickle
>>> f_str = pickle.dumps(f)
>>> f_new = pickle.loads(f_str)
bar baz {}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test.py", line 133, in __setitem__
    self.counter += 1
AttributeError: 'Foo' object has no attribute 'counter'

I think the print() in __setitem__ shows the problem: pickle.loads attempts to write the dictionary's keys before it writes the object's attributes... at least I think that's what's happening. It's pretty easy to verify if you remove the self.counter reference in Foo.__setitem__():

>>> f_mod = ModifiedFoo()
>>> f_mod['bar'] = 'baz'
>>> f_mod_str = pickle.dumps(f_mod)
>>> f_mod_new = pickle.loads(f_mod_str)
bar baz {}
>>> assert f_mod_new.counter == 0
>>>

Is this just a byproduct of the pickle protocol? I've tried variations on __setstate__ to let it unpickle correctly, but as far as I can tell, it hits the __setitem__ error before __setstate__ is even called. Is there any way I can modify this object to allow unpickling?


Solution

  • As stated by pickle documentation:

    When a pickled class instance is unpickled, its __init__() method is normally not invoked.

    In your case you do want to invoke __init__. However since your class is a new-style class you cannot use __getinitargs__ (which isn't supported in python3 anyway). You could try to write your custom __getstate__ and __setstate__ methods:

    class Foo(dict):
        def __init__(self):
            self.counter = 0
        def __getstate__(self):
            return (self.counter, dict(self))
        def __setstate__(self, state):
            self.counter, data = state
            self.update(data)  # will *not* call __setitem__
    
        def __setitem__(self, key, value):
            self.counter += 1
            super(Foo, self).__setitem__(key, value)
    

    However this still doesn't work, because since you are subclassing dict and dict has a special handler for pickling, the __getstate__ method is called, however the __setstate__ method is not.

    You can work around this defining the __reduce__ method:

    class Foo(dict):
        def __init__(self):
            self.counter = 0
        def __getstate__(self):
            return (self.counter, dict(self))
        def __setstate__(self, state):
            self.counter, data = state
            self.update(data)
        def __reduce__(self):
            return (Foo, (), self.__getstate__())
    
        def __setitem__(self, key, value):
            self.counter += 1
            super(Foo, self).__setitem__(key, value)