Search code examples
pythonc++ooppickleboost-python

what does __getstate_manages_dict__ do?


I've been working on multiprocessing and C++ extensions and I don't quite get the __getstate_manages_dict__ function (I know how to use it, but I'm not really sure how it works). The boost/Python documentation for pickle support says this:

The author of a Boost.Python extension class might provide a __getstate__ method without considering the possibilities that: * his class is used in Python as a base class. Most likely the dict of instances of the derived class needs to be pickled in order to restore the instances correctly. * the user adds items to the instance's __dict__ directly. Again, the __dict__ of the instance then needs to be pickled.

To alert the user to this highly unobvious problem, a safety guard is provided. If __getstate__ is defined and the instance's __dict__ is not empty, Boost.Python tests if the class has an attribute __getstate_manages_dict__. An exception is raised if this attribute is not defined:

I've seen some examples where the object's __dict__ is returned in __getstate__ and then updated in __setstate__. What is this __dict__ refering to? Is it the __dict__ attribute of the derived class object? Also, why does this dict needs to be handled explicitly if pickle calls __init__ to create a new object and then sets the attribute?

Thanks


Solution

  • I know how to use it, but I'm not really sure how it works

    It's a boolean value that is False by default. The point is to signal to Boost that the __getstate__/__setstate__ implementation handles the class' __dict__ attribute, so that the information won't be lost in the pickling process.

    The idea is that Boost::Python can't actually determine whether the code is written properly, so instead you are made to jump through this extra hurdle so that, if you are unaware of the problem, you see an error message - as they say, to alert the user to this highly unobvious problem.

    It's not doing anything magical. It's just there to confirm that you "read the warranty", so to speak.

    The boost/Python documentation for pickle support says this:

    This is just explaining the reasons why it's important to consider the __dict__ contents - even if you don't want to pickle all the attributes that you set explicitly in the __init__ (for example, because the class holds a reference to a large resource that you intend to load in some other way, or a results cache, or...). In short, instances of your class might contain information that you didn't expect them to contain, and that your __getstate__ implementation won't know how to handle, unless it takes the instance's __dict__ into account.

    Hence the "practical advice" offered: "If __getstate__ is required, include the instance's __dict__ in the Python object that is returned."

    What is this __dict__ referring to? Is it the __dict__ attribute of the derived class object?

    It's the __dict__ attribute of the instance that __getstate__ was called upon. That could be an instance of a derived class, if that class doesn't override the methods. Or it could be an instance of the base class, which may or may not have had extra attributes added outside the class implementation.

    Also, why does this dict needs to be handled explicitly if pickle calls init to create a new object and then sets the attribute?

    See above. When you get the attributes (so that the pickle file can be written), you need to make sure that you actually get all the necessary attributes, or else they'll be missing upon restoration. Hard-coded logic can miss some.