Consider this yaml file:
!my-type
name: My type
items:
- name: First item
number: 42
- name: Second item
number: 43
There is one top level object that contains a collection of dictionaries, and I can load it fine with PyYAML. Now, I want to use a proper class instead of these item dictionaries:
!my-type
name: My type
items:
- !my-type-item
name: First item
number: 42
- !my-type-item
name: Second item
number: 43
But this syntax is cumbersome and redundant, since all items in this collection are of the same type. And it gets very ugly when there are hundreds of these items. Is it possible to tag these items implicitly?
I considered using yaml.add_path_resolver
but this API does not seem to be public or stable.
The YAML spec says
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.
which means you are in accordance to the spec when you do this. I guess this is what add_path_resolver
tries to implement.
The problem here is that Python does not have classes with declared, typed fields. Languages that have those can inspect them and load data with the proper type implicitly (done by SnakeYAML, go-yaml et al.). With PyYAML, to do this you'll need to implement a custom constructor, e.g.:
import yaml
def get_value(node, name):
assert isinstance(node, yaml.MappingNode)
for key, value in node.value:
assert isinstance(key, yaml.ScalarNode)
if key.value == name:
return value
class MyTypeItem:
def __init__(self, name, number):
self.name, self.number = name, number
@classmethod
def from_yaml(cls, loader, node):
name = get_value(node, "name")
assert isinstance(name, yaml.ScalarNode)
number = get_value(node, "number")
assert isinstance(number, yaml.ScalarNode)
return MyTypeItem(name.value, int(number.value))
def __repr__(self):
return f"MyTypeItem(name={self.name}, number={self.number})"
class MyType(yaml.YAMLObject):
yaml_tag = "!my-type"
def __init__(self, name, items):
self.name, self.items = name, items
@classmethod
def from_yaml(cls, loader, node):
name = get_value(node, "name")
assert isinstance(name, yaml.ScalarNode)
items = get_value(node, "items")
assert isinstance(items, yaml.SequenceNode)
return MyType(name.value,
[MyTypeItem.from_yaml(loader, n) for n in items.value])
def __repr__(self):
return f"MyType(name={self.name}, items={self.items})"
input = """
!my-type
name: My type
items:
- name: First item
number: 42
- name: Second item
number: 43
"""
print(yaml.load(input, yaml.FullLoader))
This gives you:
MyType(name=My type, items=[MyTypeItem(name=First item, number=42), MyTypeItem(name=Second item, number=43)])
Only the uppermost class derives from yaml.YAMLObject
and has a yaml_tag
, so that PyYAML can implicitly use it for the root item. MyTypeItem.from_yaml
is called explictly from MyType
and thus doesn't need to register with PyYAML (you can do that to also be able to load files that contain such an item directly).
You need to do conversions to non-string values manually (as shown with int(number.value)
) since .value
of any scalar node is always a string.