Search code examples
pythonyamldefaultdictruamel.yaml

Safe dumping and loading of defaultdict with ruamel.yaml


I'm trying to (de-)serialize classes that have collections.defaultdict properties with ruamel.yaml in Python (3.6+ in my case).

This would be a minimal example that I would like to get to work:

from collections import defaultdict
import ruamel.yaml
from pathlib import Path

class Foo:
    def __init__(self):
        self.x = defaultdict()


YAML = ruamel.yaml.YAML(typ="safe")
YAML.register_class(Foo)
YAML.register_class(defaultdict)

fp =  Path("./test.yaml")
YAML.dump(Foo(), fp)
YAML.load(fp)

But this fails with:

AttributeError: 'collections.defaultdict' object has no attribute '__dict__'

Any ideas that would not require writing custom code for every "Foo-like" class? I was hoping I could add a different representer for defaultdict objects, but my attempts have been in vain so far.

Full traceback:

Traceback (most recent call last):
File "./tests/test_yaml.py", line 18, in <module>
    YAML.dump(Foo(), fp)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 439, in dump
    return self.dump_all([data], stream, _kw, transform=transform)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 453, in dump_all
    self._context_manager.dump(data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 801, in dump
    self._yaml.representer.represent(data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 81, in represent
    node = self.represent_data(data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 108, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 638, in t_y
    tag, data, cls, flow_style=representer.default_flow_style
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 384, in represent_yaml_object
    return self.represent_mapping(tag, state, flow_style=flow_style)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 218, in represent_mapping
    node_value = self.represent_data(item_value)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 108, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\main.py", line 638, in t_y
    tag, data, cls, flow_style=representer.default_flow_style
File "C:\miniconda-windows\envs\ratio\lib\site-packages\ruamel\yaml\representer.py", line 383, in represent_yaml_object
    state = data.__dict__.copy()
AttributeError: 'collections.defaultdict' object has no attribute '__dict__'

Solution

  • There is now a package ruamel.yaml.pytypes that supports dumping defaultdict instances. Please note that if you provide a function as parameter (for the default_factory) you will need to specify typ='unsafe' as otherwise your factory function cannot be represented.

    After installing ruamel.yaml.pytypes and ruamel.yaml in your virtualenv, you can do:

    yaml = ruamel.yaml.YAML(typ=['unsafe', 'pytypes'])
    yaml.default_flow_style = False
    buf = ruamel.yaml.compat.StringIO()
    
    def factory():
        import datetime
        return datetime.datetime.now()
    
    data = defaultdict(factory)
    
    x = data[4]
    data[2] = 42
    yaml.dump(data, buf)
    print(buf.getvalue(), end='')
    d = yaml.load(buf.getvalue())
    assert data == d
    assert data.default_factory == d.default_factory
    

    the above will print (your datetime will be different).

    !defaultdict
    - !!python/name:__main__.factory 
    - 2: 42
      4: 2019-08-19 13:06:05.129019
    

    (and the assert will not throw an exception)


    See the edit history for "manual" ways to achieve similar results.