Search code examples
pythonyamlpyyaml

How to tag nodes implicitly in yaml (PyYAML)


Consider this yaml file:

!my-type
name: My type
items:
  - name: First item
    number: 42
  - name: Second item
    number: 43

There is one top level object that contains a collection of dictionaries, and I can load it fine with PyYAML. Now, I want to use a proper class instead of these item dictionaries:

!my-type
name: My type
items:
  - !my-type-item
    name: First item
    number: 42
  - !my-type-item
    name: Second item
    number: 43

But this syntax is cumbersome and redundant, since all items in this collection are of the same type. And it gets very ugly when there are hundreds of these items. Is it possible to tag these items implicitly?

I considered using yaml.add_path_resolver but this API does not seem to be public or stable.


Solution

  • The YAML spec says

    Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.

    which means you are in accordance to the spec when you do this. I guess this is what add_path_resolver tries to implement.

    The problem here is that Python does not have classes with declared, typed fields. Languages that have those can inspect them and load data with the proper type implicitly (done by SnakeYAML, go-yaml et al.). With PyYAML, to do this you'll need to implement a custom constructor, e.g.:

    import yaml
    
    def get_value(node, name):
        assert isinstance(node, yaml.MappingNode)
        for key, value in node.value:
            assert isinstance(key, yaml.ScalarNode)
            if key.value == name:
                return value
    
    class MyTypeItem:
        def __init__(self, name, number):
            self.name, self.number = name, number
    
        @classmethod
        def from_yaml(cls, loader, node):
            name = get_value(node, "name")
            assert isinstance(name, yaml.ScalarNode)
    
            number = get_value(node, "number")
            assert isinstance(number, yaml.ScalarNode)
    
            return MyTypeItem(name.value, int(number.value))
    
        def __repr__(self):
            return f"MyTypeItem(name={self.name}, number={self.number})"
    
    class MyType(yaml.YAMLObject):
        yaml_tag = "!my-type"
    
        def __init__(self, name, items):
            self.name, self.items = name, items
    
        @classmethod
        def from_yaml(cls, loader, node):
            name = get_value(node, "name")
            assert isinstance(name, yaml.ScalarNode)
    
            items = get_value(node, "items")
            assert isinstance(items, yaml.SequenceNode)
    
            return MyType(name.value,
                    [MyTypeItem.from_yaml(loader, n) for n in items.value])
    
        def __repr__(self):
            return f"MyType(name={self.name}, items={self.items})"
    
    input = """
    !my-type
    name: My type
    items:
      - name: First item
        number: 42
      - name: Second item
        number: 43
    """
    
    print(yaml.load(input, yaml.FullLoader))
    

    This gives you:

    MyType(name=My type, items=[MyTypeItem(name=First item, number=42), MyTypeItem(name=Second item, number=43)])
    

    Only the uppermost class derives from yaml.YAMLObject and has a yaml_tag, so that PyYAML can implicitly use it for the root item. MyTypeItem.from_yaml is called explictly from MyType and thus doesn't need to register with PyYAML (you can do that to also be able to load files that contain such an item directly).

    You need to do conversions to non-string values manually (as shown with int(number.value)) since .value of any scalar node is always a string.