Search code examples
pythonyamlruamel.yaml

Yaml merge key with multiple levels, ruamel.yaml produce incorrect output or i misunderstand yaml?


Recently during edit of a bit complex yaml config i need to do a bit tricky yaml merge key operation and i noticed that my favorite tool ruamel.yaml produce illogical results. I know that merge keys is deprecated, but as soon as 1.3 specs are not released, i have to keep using it. I filled ticket, but author set it as invalid and stated that i misunderstand yaml.

Here is example of yaml code to test merge:

tag1: &tag1
  subtag1:
    subsubtag1:
    subsubtag2:
       ssstag31:
       - var1
       - var2
       ssstag32:
       - var1
       - var2

tag2: 
  <<: *tag1
  subtag1:
    subsubtag2:
       ssstag31:
       - var3
       - var4

I expect that first it will merge tag1 anchor to tag2, then replace subtag1 by new data. So tag2 will look like this

tag2:
  subtag1:
    subsubtag2:
      ssstag31:
      - var3
      - var4

ruamel.yaml unfortunately does merge, but doesn't replace data, so tag2 is identical to tag1.

It is easy to test it by trivial python program which produce results i expect.

import yaml

class NoAliasDumper(yaml.SafeDumper):
    def ignore_aliases(self, data):
        return True

with open("example.yaml") as f:
    y = yaml.safe_load(f)
with open(r'merged.yaml', 'w') as file:
    yaml.dump(y, file, Dumper=NoAliasDumper)

Please advise where I went wrong if python does the right merge and the ruamel.yaml doesn't. What is correct results of merge? As it means bug is either in python yaml or in ruamel.yaml

P.S. By the way, it's funny to check this snippet in online utilities that deal with it with varying degrees of success.


Solution

  • I am not sure what you mean by "ruamel.yaml unfortunately does merge tag1 anchor to tag2".

    import sys
    import ruamel.yaml
    from pathlib import Path
    
    
    file_in = Path('expand.yaml')
    yaml = ruamel.yaml.YAML()
    data = yaml.load(file_in)
    yaml.dump(data, sys.stdout)
    

    this gives exactly the original input:

    tag1: &tag1
      subtag1: 42
      subtag2: baz
    tag2:
      <<: *tag1
      subtag1: 18
      subtag3: *tag1
    

    So it preserves both the aliases and merge key. (I am using a smaller example than yours, but more complete in that not all the keys of the merge are "covered" by other keys and that the anchor is still referenced if the merge key is removed).

    You can ignore aliases in ruamel.yaml, but the effect is not really useful.

    yaml = ruamel.yaml.YAML()
    yaml.representer.ignore_aliases = lambda x: True
    data = yaml.load(file_in)
    yaml.dump(data, sys.stdout)
    

    which gives:

    tag1:
      subtag1: 42
      subtag2: baz
    tag2:
      <<:
        subtag1: 42
        subtag2: baz
      subtag1: 18
      subtag3:
        subtag1: 42
        subtag2: baz
    

    IIRC the merge-epxand option of the yaml utility (as provided by the package ruamel.yaml.cmd) was made before the ruamel.yaml package could preserve merges. That options relies on the mapping_flattener of the SafeLoader (the RoundTripLoader's doesn't flatten in order to not loose the merge key information). But either the improvements on the PyYAML original (which handles duplicate keys incorrectly), or the interaction between the aliases and merge keys caused that to not function properly.

    Unfortunately you cannot use PyYAML's flatten_mapping, as it errors with the less than useful message:

    expected a mapping or list of mappings for merging, but found mapping

    But you can do:

    import sys
    import ruamel.yaml
    from pathlib import Path
    
    def flatten_mapping(self, node):
        merge = []
        index = 0
        while index < len(node.value):
            key_node, value_node = node.value[index]
            if key_node.tag == 'tag:yaml.org,2002:merge':
                del node.value[index]
                if isinstance(value_node, ruamel.yaml.nodes.MappingNode):
                    self.flatten_mapping(value_node)
                    merge.extend(value_node.value)
                elif isinstance(value_node, ruamel.yaml.nodes.SequenceNode):
                    submerge = []
                    for subnode in value_node.value:
                        if not isinstance(subnode, ruamel.yaml.nodes.MappingNode):
                            raise ConstructorError(
                                'while constructing a mapping',
                                node.start_mark,
                                f'expected a mapping for merging, but found {subnode.id!s}',
                                subnode.start_mark,
                            )
                        self.flatten_mapping(subnode)
                        submerge.append(subnode.value)
                    submerge.reverse()
                    for value in submerge:
                        merge.extend(value)
                else:
                    raise ConstructorError(
                        'while constructing a mapping',
                        node.start_mark,
                        'expected a mapping or list of mappings for merging, '
                        f'but found {value_node.id!s}',
                        value_node.start_mark,
                    )
            elif key_node.tag == 'tag:yaml.org,2002:value':
                key_node.tag = 'tag:yaml.org,2002:str'
                index += 1
            else:
                index += 1
        if bool(merge):
            values = [k[0].value for k in node.value]
            for k in merge:
                if k[0].value in values:
                    continue
                node.value.append(k)
    
    file_in = Path('expand.yaml')
    yaml = ruamel.yaml.YAML()
    # Using PyYAML's flattener doesn't work 
    # import yaml as pyyaml
    # yaml.Constructor.flatten_mapping = pyyaml.constructor.SafeConstructor.flatten_mapping 
    yaml.Constructor.flatten_mapping = flatten_mapping
    # uncomment next line if you don't want aliases
    # yaml.representer.ignore_aliases = lambda x: True
    data = yaml.load(file_in)
    yaml.dump(data, sys.stdout)
    

    which I think gives what you want:

    tag1: &tag1
      subtag1: 42
      subtag2: baz
    tag2:
      subtag1: 18
      subtag3: *tag1
      subtag2: baz
    

    So your ticket was not invalid.