Search code examples
pythonpyyaml

Using generic constructor to read yaml file


I need to read a AWS cloudformation file in python using a package pyyaml. The issue seems to have been resolved using this code snippet.

def generic_constructor(loader, tag, node):
    classname = node.__class__.__name__
    if (classname == 'SequenceNode'):
        return loader.construct_sequence(node)
    elif (classname == 'MappingNode'):
        return loader.construct_mapping(node)
    else:
        return loader.construct_scalar(node)

yaml.add_multi_constructor('', generic_constructor)

How do I use this code to read a yaml file?

https://github.com/yaml/pyyaml/issues/169

The issue has been closed on github and it means this code must be working correctly to read the yaml file linked by the reporter of the issue.


The answer is correct. In short, the code that works looks like this...

import yaml

def generic_constructor(loader, tag, node):
    classname = node.__class__.__name__
    if classname == "SequenceNode":
        return loader.construct_sequence(node)
    elif classname == "MappingNode":
        return loader.construct_mapping(node)
    else:
        return loader.construct_scalar(node)

yaml.add_multi_constructor("", generic_constructor, Loader=yaml.SafeLoader)

with open("mytest.yaml") as f:
    data = yaml.safe_load(f)

Solution

  • The general idea is that your code import PyYAML using:

    import yaml
    

    and after that you invoke the snippet, which modifies the default loader. Since the default loader that PyYAML uses has changed since that issue was closed, you are better off specifying e.g. the SafeLoader explicitly:

    yaml.add_multi_constructor('', generic_constructor, Loader=SafeLoader)
    

    and then use data = yaml.safe_load(open_file_or_string) to load the data.

    It is probably easier to use ruamel.yaml (disclaimer: I am the author of that package), which by default can handle special tags (including those of AWS), although you should specify the ten years outdated YAML 1.1 version (which is what AWS expects and the only thing PyYAML supports).

    from ruamel.yaml import YAML
    
    yaml = YAML()
    yaml.version = (1, 1)
    data = yaml.load(x)
    

    where x can be a pathlib.Path() instance, an opened file or a string.