Search code examples
pythonyamlpyyaml

Accept anchor (alias) in constructor in PyYAML


I need to create a custom constructor for tag. The tag should accept lists as well as anchors to lists.

Example, how I want to use my tag:

original: &value [1, 2, 3]
processed: !mytag *value

So I create a basic constructor for !mytag which returns the input sequence:

import yaml

def my_constructor(loader, node):
     return loader.construct_sequence(node)

yaml.Loader.add_constructor('!mytag', my_constructor)

But when I try to load the YAML source above, I get an error:

>>> source = '''original: &value [1, 2, 3]
processed: !mytag *value'''

>>> yaml.load(source, yaml.Loader)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in t
  File "/usr/local/lib/python3.7/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.7/site-packages/yaml/constructor.py", line 41, in get_single_data
    node = self.get_single_node()
  File "/usr/local/lib/python3.7/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/local/lib/python3.7/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/local/lib/python3.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/local/lib/python3.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/usr/local/lib/python3.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/local/lib/python3.7/site-packages/yaml/parser.py", line 439, in parse_block_mapping_key
    "expected <block end>, but found %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a block mapping
  in "test.yml", line 1, column 1
expected <block end>, but found '<alias>'
  in "test.yml", line 2, column 19

Magically it works if I surround anchor reference with square brackets:

>>> source = '''original: &value [1, 2, 3]
processed: !mytag [*value]'''

>>> yaml.load(source, yaml.Loader)
{'original': [1, 2, 3], 'processed': [[1, 2, 3]]}

But that is not what I want, I need to pass the original list to the constructor, not the double-list.

UPD: double-list doesn't work either. Even though if I return it, it appears in result as original list, but if try to access it from constructor, it is just an empty list at that stage:

>>> source = '''original: &value [1, 2, 3]
... processed: !mytag [*value]'''
>>>
>>> def my_constructor(loader, node):
...     print(loader.construct_sequence(node))
...     return loader.construct_sequence(node)
...
>>> yaml.Loader.add_constructor('!mytag', my_constructor)
>>>
>>> yaml.load(source, yaml.Loader)
[[]]  # <--- this is the printed value
{'original': [1, 2, 3], 'processed': [[1, 2, 3]]}  # <--- this is the returned value

Does anybody have an idea how to do that?

Python 3.7.6 PyYAML 5.3


Solution

  • The additional sequence is exactly what you want.

    Mind that YAML tags describe the type of a node, and are not processing instructions. Aliases refer to existing nodes, which already have a type even if they don't have an explicit tag (your original sequence will e.g. be tagged as !!seq under the YAML core schema).

    Now if you want the semantics of your tag to be „take an existing node, and transform it in some way“, it describes a function call. A function call is a structure on its own which only refers to its input as parameter. Therefore, to properly model it, you need to place your parameter in a structure, and the sequence is the easiest way to do that. You could also do it with a mapping:

    original: &value [1, 2, 3]
    processed: !mytag {input: *value}
    

    but that's more verbose.

    In your constructor, you then extract the parameter from the surrounding structure and do your processing on it.

    Edit: Here's a proof-of-concept of accessing the referred list. I am unsure why you have to manually navigate into the outer node, this may be a PyYAML bug.

    import yaml
    
    source = '''original: &value [1, 2, 3]
    processed: !mytag [*value]'''
    
    def my_constructor(loader, node):
      assert isinstance(node, yaml.SequenceNode)
      param = loader.construct_sequence(node.value[0], deep=True)
      print(param)
      # do something with the param here
      return param
    
    yaml.Loader.add_constructor('!mytag', my_constructor)
    
    print(yaml.load(source, yaml.Loader))
    

    Output:

    [1, 2, 3]
    {'original': [1, 2, 3], 'processed': [1, 2, 3]}