Search code examples
pythonyamlpyyamlruamel.yaml

YAML - Dumping a nested object without types/tags


I'm trying to dump some Python objects out into YAML.

Currently, regardless of YAML library (pyyaml, oyaml, or ruamel) I'm having an issue where calling .dump(MyObject) gives me correct YAML, but seems to add a lot of metadata about the Python objects that I don't want, in a form that looks like:

!!python/object:MyObject and other similar strings.

I do not need to be able to rebuild the objects from the YAML, so I am fine for this metadata to be removed completely

Other questions on SO indicate that the common solution to this is to use safe_dump instead of dump.

However, safe_dump does not seem to work for nested objects (or objects at all), as it throws this error:

yaml.representer.RepresenterError: ('cannot represent an object', MyObject)

I see that the common workaround here is to manually specify Representers for the objects that I am trying to dump. My issue here is that my Objects are generated code that I don't have control over. I will also be dumping a variety of different objects.

Bottom line: Is there a way to dump nested objects using .dump, but where the metadata isn't added?


Solution

  • Although the words "correct YAML" are not really accurate, and would be better phrased as "YAML output looking like you want it, except for the tag information", this fortunately gives some information on how you want your YAML to look, as there are an infinite number of ways to dump objects.

    If you dump an object using ruamel.yaml:

    import sys
    import ruamel.yaml
    
    class MyObject:
       def __init__(self, a, b):
          self.a = a
          self.b = b
          self.c = [a, b]
    
    data = dict(x=MyObject(42, -1))
    
    
    yaml = ruamel.yaml.YAML(typ='unsafe')
    yaml.dump(data, sys.stdout)
    

    this gives:

    x: !!python/object:__main__.MyObject
      a: 42
      b: -1
      c: [42, -1]
    

    You have a tag !!python/object:__main__.MyObject (yours might differ depending on where the class is defined, etc.) and each of the attributes of the class are dumped as keys of a mapping.

    There are multiple ways on how to get rid of the tag in that dump:

    Registering classes

    Add a classmethod named to_yaml(), to each of your classes and register those classes. You have to do this for each of your classes, but doing so allows you to use the safe-dumper. An example on how to do this can be found in the documentation

    Post-process

    It is fairly easy to postprocess the output and remove the tags, which for objects always occur on the line before the mapping, and you can delete from !!python until the end-of-line

    def strip_python_tags(s):
        result = []
        for line in s.splitlines():
            idx = line.find("!!python/")
            if idx > -1:
                line = line[:idx]
            result.append(line)
        return '\n'.join(result)
    
    yaml.encoding = None
    yaml.dump(data, sys.stdout, transform=strip_python_tags)
    

    and that gives:

    x: 
      a: 42
      b: -1
      c: [42, -1]
    

    As achors are dumped before the tag, this "stripping from !!python until end-of-the line", also works when you dump object that have multiple references.

    Change the dumper

    You can also change the unsafe dumper routine for mappings to recognise the tag used for objects and change the tag to the "normal" one for dict/mapping (for which normally a tag is not output )

    yaml.representer.org_represent_mapping = yaml.representer.represent_mapping
    
    def my_represent_mapping(tag, mapping, flow_style=None):
        if tag.startswith("tag:yaml.org,2002:python/object"):
            tag = u'tag:yaml.org,2002:map'
        return yaml.representer.org_represent_mapping(tag, mapping, flow_style=flow_style)
    
    yaml.representer.represent_mapping = my_represent_mapping
    
    yaml.dump(data, sys.stdout)
    

    and that gives once more:

    x:
      a: 42
      b: -1
      c: [42, -1]
    

    These last two methods work for all instances of all Python classes that you define without extra work.