Search code examples
pythonpyyaml

How to use custom dictionary class while loading yaml?


I am currently loading a YAML file like this

 import yaml
 yaml.load('''level0:
                 stuff: string0
                 level1: 
                     stuff: string1
                     level2: ...''')

The code above creates nested dictionaries. Instead of creating nested dictionaries, I want to create nested instances of FancyDict objects.

class FancyDict(collections.MutableMapping):
   def __init__(self, *args, **kwargs):
       for name in kwargs:
          setattr(self, name, kwargs[name])

The section on Constructors, Representer, Resolvers doesn't seem to address this case where I want to globally override the class construction for all dictionaries instead of special tagged ones.

I just need a hook that would be called a object (node?) is created/finalized.
Is there a easy way to do this or should I just traverse the nested dictionaries that a yaml.load returns to me and fix them myself?


Solution

  • That hook is not there, the type that is constructed is hard-coded in construct.BaseConstructor.construct_mapping().

    The way to solve this is make your own constructor and based on that your own loader, and hand that one in as option for load():

    import sys
    import collections
    import ruamel.yaml as yaml
    
    yaml_str = """\
    level0:
      stuff: string0
      level1:
        stuff: string1
        level2: ...
    """
    
    from ruamel.yaml.reader import Reader
    from ruamel.yaml.scanner import Scanner
    from ruamel.yaml.parser import Parser
    from ruamel.yaml.composer import Composer
    from ruamel.yaml.constructor import SafeConstructor
    from ruamel.yaml.resolver import Resolver
    from ruamel.yaml.nodes import MappingNode
    
    
    class FancyDict(collections.MutableMapping):
        def __init__(self, *args, **kwargs):
            for name in kwargs:
                setattr(self, name, kwargs[name])
    
        # provide the missing __getitem__, __setitem__, __delitem__, __iter__, and __len__.
    
    class MyConstructor(SafeConstructor):
        def construct_mapping(self, node, deep=False):
            res = SafeConstructor.construct_mapping(self, node, deep)
            assert isinstance(res, dict)
            return FancyDict(**res)
    
    
    class MyLoader(Reader, Scanner, Parser, Composer, MyConstructor, Resolver):
        def __init__(self, stream, version=None):
            Reader.__init__(self, stream)
            Scanner.__init__(self)
            Parser.__init__(self)
            Composer.__init__(self)
            MyConstructor.__init__(self)
            Resolver.__init__(self)
    
    
    data = yaml.load(yaml_str, Loader=MyLoader)
    

    When you run this you'll get an error that FancyDict is an abstract class that cannot be instantiated:

    TypeError: Can't instantiate abstract class FancyDict with abstract methods __delitem__, __getitem__, __iter__, __len__, __setitem__

    I guess your real FancyDict has those implemented.


    ruamel.yaml is a YAML library that supports YAML 1.2 (I recommend using that, but then I am the author of the package). PyYAML only supports (most of) YAML 1.1. More problematically, it has different constructor.py files for Python2 and Python3, you might not be able to drop in the above code in PyYAML because of that.