Search code examples
pythonserializationyamlpyyaml

PyYAML - Serialize classes (types / class references) that are members of objects


Short version: How can one serialize a class (class reference, i.e. not an object) that is a member of an object (see: example)?

Long version:

I have been using the answer to this question in my work: How can I ignore a member when serializing an object with PyYAML?

So, my current implementation is this:

class SecretYamlObject(yaml.YAMLObject):
    """Helper class for YAML serialization.
    Source: https://stackoverflow.com/questions/22773612/how-can-i-ignore-a-member-when-serializing-an-object-with-pyyaml """

    def __init__(self, *args, **kwargs):
        self.__setstate__(self, kwargs) #Default behavior, so one could just use setstate
        pass

    hidden_fields = []
    @classmethod
    def to_yaml(cls,dumper,data):
        new_data = copy(data)
        for item in cls.hidden_fields:
            if item in new_data.__dict__:
               del new_data.__dict__[item]
        res = dumper.represent_yaml_object(cls.yaml_tag, new_data, cls, flow_style=cls.yaml_flow_style)
        return res

So far, this has been working fine for me because until now I have only needed to hide loggers:

class EventManager(SecretYamlObject):
    yaml_tag = u"!EventManager"
    hidden_fields = ["logger"]

    def __setstate__(self, kw): # For (de)serialization
        self.logger = logging.getLogger(__name__)
        self.listeners = kw.get("listeners",{})
        #...
        return


    def __init__(self, *args, **kwargs):
        self.__setstate__(kwargs)
        return

However, a different problem appears when I try to serialize non-trivial objects (if Q is directly from object, this is fine, but from yaml.YAMLObject it fails with "can't pickle int objects"). See this example:

class Q(SecretYamlObject): #works fine if I just use "object"
    pass

class A(SecretYamlObject):
    yaml_tag = u"!Aobj"
    my_q = Q
    def __init__(self, oth_q):
        self.att = "att"
        self.oth_q = oth_q
        pass
    pass

class B(SecretYamlObject):
    yaml_tag = u"!Bobj"
    my_q = Q
    hidden_fields = ["my_q"]
    def __init__(self, oth_q):
        self.att = "att"
        self.oth_q = oth_q
        pass
    pass

class C(SecretYamlObject):
    yaml_tag = u"!Cobj"
    my_q = Q
    hidden_fields = ["my_q"]

    def __init__(self, *args, **kwargs):
        self.__setstate__(kwargs)
        pass

    def __setstate__(self, kw):
        self.att = "att"
        self.my_q = Q
        self.oth_q = kw.get("oth_q",None)
        pass

    pass

a = A(Q)
a2 = yaml.load(yaml.dump(a))

b = B(Q)
b2 = yaml.load(yaml.dump(b))

c = C(my_q=Q)
c2 = yaml.load(yaml.dump(c))
c2.my_q
c2.oth_q

A and B give "can't pickle int objects" errors, while C doesn't initialize oth_q (because there is no information about it).

Question: How to preserve the information about which class reference is held?

(I need to hold the class reference to be able to make objects of that type - an alternate for this might work too)


Solution

  • When loading dumped YAML, you normally don't need to preserve the information about which class needs to be instantiated. That is what tag information, stored in the file with !XObj, is for.

    If you hide a reference to an object of a certain class, by not dumping the attribute that refers to it, and then run into problems instantiating that object (because you don't know its class) when loading, you are doing something wrong. In that case you should hide the internals of the referenced object, not the attribute that references the object. You could e.g. dump the referenced object using !XObj null.

    By hiding the internals, you will have the appropriate tag, pointing to the right class to create an object from, when loading. You'll have to decide what your programs with the internals for that object, based on the limited null information.

    Warning: you should seriously reconsider using yaml.YAMLObject in the way you do. You are using the, documented as unsafe, load() and if you cannot guarantee 100% control, now and at any time in the future, of your YAML input, you might lose the content of your drive, the secrecy of the objects you try to hide, or worse. You should be using safe_load() or move away from using a library like PyYAML, which defaults to being unsafe.