I'm trying to dump some Python objects out into YAML.
Currently, regardless of YAML library (pyyaml
, oyaml
, or ruamel
) I'm having an issue where calling .dump(MyObject)
gives me correct YAML, but seems to add a lot of metadata about the Python objects that I don't want, in a form that looks like:
!!python/object:MyObject
and other similar strings.
I do not need to be able to rebuild the objects from the YAML, so I am fine for this metadata to be removed completely
Other questions on SO indicate that the common solution to this is to use safe_dump
instead of dump
.
However, safe_dump
does not seem to work for nested objects (or objects at all), as it throws this error:
yaml.representer.RepresenterError: ('cannot represent an object', MyObject)
I see that the common workaround here is to manually specify Representers for the objects that I am trying to dump. My issue here is that my Objects are generated code that I don't have control over. I will also be dumping a variety of different objects.
Bottom line: Is there a way to dump nested objects using .dump
, but where the metadata isn't added?
Although the words "correct YAML" are not really accurate, and would be better phrased as "YAML output looking like you want it, except for the tag information", this fortunately gives some information on how you want your YAML to look, as there are an infinite number of ways to dump objects.
If you dump an object using ruamel.yaml
:
import sys
import ruamel.yaml
class MyObject:
def __init__(self, a, b):
self.a = a
self.b = b
self.c = [a, b]
data = dict(x=MyObject(42, -1))
yaml = ruamel.yaml.YAML(typ='unsafe')
yaml.dump(data, sys.stdout)
this gives:
x: !!python/object:__main__.MyObject
a: 42
b: -1
c: [42, -1]
You have a tag !!python/object:__main__.MyObject
(yours might differ depending on where the
class is defined, etc.) and each of the attributes of the class are dumped as keys of a mapping.
There are multiple ways on how to get rid of the tag in that dump:
Add a classmethod
named to_yaml()
, to each of your classes and
register those classes. You have to do this for each of your classes,
but doing so allows you to use the safe-dumper. An example on how to
do this can be found in the
documentation
It is fairly easy to postprocess the output and remove the tags, which for objects always occur on the line
before the mapping, and you can delete from !!python
until the end-of-line
def strip_python_tags(s):
result = []
for line in s.splitlines():
idx = line.find("!!python/")
if idx > -1:
line = line[:idx]
result.append(line)
return '\n'.join(result)
yaml.encoding = None
yaml.dump(data, sys.stdout, transform=strip_python_tags)
and that gives:
x:
a: 42
b: -1
c: [42, -1]
As achors are dumped before the tag, this "stripping from !!python
until end-of-the line", also works when you dump object that have
multiple references.
You can also change the unsafe dumper routine for mappings to recognise the tag used for objects and change the tag to the "normal" one for dict/mapping (for which normally a tag is not output )
yaml.representer.org_represent_mapping = yaml.representer.represent_mapping
def my_represent_mapping(tag, mapping, flow_style=None):
if tag.startswith("tag:yaml.org,2002:python/object"):
tag = u'tag:yaml.org,2002:map'
return yaml.representer.org_represent_mapping(tag, mapping, flow_style=flow_style)
yaml.representer.represent_mapping = my_represent_mapping
yaml.dump(data, sys.stdout)
and that gives once more:
x:
a: 42
b: -1
c: [42, -1]
These last two methods work for all instances of all Python classes that you define without extra work.