Search code examples
pythonyamlpyyaml

Is there a way to construct an object using PyYAML construct_mapping after all nodes complete loading?


I am trying to make a yaml sequence in python that creates a custom python object. The object needs to be constructed with dicts and lists that are deconstructed after __init__. However, it seems that the construct_mapping function does not construct the entire tree of embedded sequences (lists) and dicts.
Consider the following:

import yaml

class Foo(object):
    def __init__(self, s, l=None, d=None):
        self.s = s
        self.l = l
        self.d = d

def foo_constructor(loader, node):
    values = loader.construct_mapping(node)
    s = values["s"]
    d = values["d"]
    l = values["l"]
    return Foo(s, d, l)
yaml.add_constructor(u'!Foo', foo_constructor)

f = yaml.load('''
--- !Foo
s: 1
l: [1, 2]
d: {try: this}''')

print(f)
# prints: 'Foo(1, {'try': 'this'}, [1, 2])'

This works fine because f holds the references to the l and d objects, which are actually filled with data after the Foo object is created.

Now, let's do something a smidgen more complicated:

class Foo(object):
    def __init__(self, s, l=None, d=None):
        self.s = s
        # assume two-value list for l
        self.l1, self.l2 = l
        self.d = d

Now we get the following error

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    d: {try: this}''')
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 43, in construct_document
    data = self.construct_object(node)
  File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
  File "test.py", line 19, in foo_constructor
    return Foo(s, d, l)
  File "test.py", line 7, in __init__
    self.l1, self.l2 = l
ValueError: need more than 0 values to unpack

This is because the yaml constructor is starting at the outer layer of nesting before and constructing the object before all nodes are finished. Is there a way to reverse the order and start with deeply embedded (e.g. nested) objects first? Alternatively, is there a way to get construction to happen at least after the node's objects have been loaded?


Solution

  • Well, what do you know. The solution I found was so simple, yet not so well documented.

    The Loader class documentation clearly shows the construct_mapping method only takes in a single parameter (node). However, after considering writing my own constructor, I checked out the source, and the answer was right there! The method also takes in a parameter deep (default False).

    def construct_mapping(self, node, deep=False):
        #...
    

    So, the correct constructor method to use is

    def foo_constructor(loader, node):
        values = loader.construct_mapping(node, deep=True)
        #...
    

    I guess PyYaml could use some additional documentation, but I'm grateful that it already exists.