Search code examples
pythonpickle

Preventing fields from being pickled


I have a class like this:

class Something(object):

    def __init__(self):
        self._thing_id
        self._cached_thing

    @property
    def thing(self):
        if self._cached_thing:
            return self._cached_thing

        return Thing.objects.get(id=self._thing_id)

When pickling objects like this, I'd like to prevent pickling of the _cached_thing field, as it's volatile and a specifically in-memory-only implementation.

Is there a way to suggest to Pickle that I only want a subset of my fields to be pickled?


Solution

  • Pickle can be customized in three ways, as described in the docs.

    • Provide __getstate__ and __setstate__ methods.
    • Provide __getnewargs__/__getnewargs_ex__ (and a constructor that takes those args).
    • Provide __reduce__ (and a function to give to __reduce__ to reverse it).

    The first is usually the simplest:

    class Something(object):
    
        def __init__(self):
            self._thing_id
            self._cached_thing
    
        def __getstate__(self):
            return self._thing_id
        def __setstate__(self, thing_id):
            self._thing_id = thing_id
    
        # etc.
    

    If you want something more generic, that will pickle all values (including those set by a subclass, or dynamically after creation, etc.) except your blacklist, note that the default is "the instance's __dict__ is pickled", so just filter that:

    _blacklist = ['_cached_thing']
    def __getstate__(self):
        return {k: v for k, v in self.__dict__.items() if k not in self._blacklist}
    def __setstate__(self, state):
        self.__dict__.update(state)
    

    And please see gnibbler's comment on the question: if you're doing something generic, you should seriously consider coming up with some kind of naming convention instead of putting a blacklist in each class. Any reader who knows or learns the convention will immediately know which properties are "cache" values rather than part of the "real" value, it'll be more obvious how things work, there's less work for you to do in each class, and fewer places to screw things up with a typo…