Search code examples
pythonruamel.yaml

How do I dump yaml without evaluating anchor values?


I have a yaml file as

anchor1: &anchor1
  resource_class: small

anchor2: &anchor2
  hello: world

anchor3: &anchor3
  hello: world

root1:
  nested1:
    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3

  nested2:
    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3

  nested3:
    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3


root2:
  nested1:
    <<: *anchor1
    some_list:
      - item1:
          hello: world2
      - *anchor2
      - *anchor3

...

I want to pull out the value of nested1 into a separate file without evaluating all the anchors.

import ruamel.yaml.YAML

yaml = YAML()
with open(Path('in_file')) as f:
    data = yaml.load(f)

with open(Path('out_file'), 'w') as f:
    yaml.dump(data['root1']['nested1'], f)

The output I want when dumping is

<<: *anchor1
some_list:
  - item1:
      hello: world
  - *anchor2
  - *anchor3

I understand it is invalid yaml, as the anchor definitions are not present.

The main problem I run into, is that the moment I grab a value from the root config, it has already been processed.

For example, if I load and dump my in_file, it works as expected, but if I take the data and get a value out, data['root1'], it has already processed the anchors.

I suspect that's because the anchor definitions are not part of data['root1'] but I'm not sure how to work around that.


Solution

  • If you are working with files containing YAML documents, use the officially recommended extension for such files, which has been .yaml since at least September 2006.

    Then you should consider using pathlib.Path() instances for files when loading instead of providing a stream:

    data = yaml.load(Path('in_file.yaml')
    

    resp. dumping:

    yaml.dump(data, Path('out_file.yaml')
    

    (although that output might be considered not to be a file containing a YAML document). Your original use of yaml.dump() is not going to work as you opened the file for reading only, and your updated version opens the output for 'w', but yaml.dump() dumps an (UTF-8) binary stream (so use 'wb')


    Although it is possible to hook into the representer to skip output until a certain point, it is much more easy to do the selection post-processing using the transform parameter of the dump method:

    import ruamel.yaml
    from pathlib import Path
    
    in_file = Path('in_file.yaml')
    out_file = Path('out_file.yaml')
    
    class SelectKey:
        """this assumes mappings for all levels of keys, mappings indented by indent spaces"""
        def __init__(self, *keys, indent=2):
            self.keys = keys
            self.indent = indent
    
        def __call__(self, s):
            """
            s will contain the full YAML output, process it line by line to find key
            """
            processing = [False] * len(self.keys)
            result = ""
            level = 0
            for line in s.splitlines(True):
                dedented_line = line[level*self.indent:]
                if processing[level]:
                    if not line.startswith(' ' * (self.indent * level)):
                        break
                    dedented_line = dedented_line[self.indent:]  # the values
                    if line and line[0] not in ' \n':
                        break
                    result += dedented_line
                else:
                    key = self.keys[level]
                    if dedented_line.startswith(key) and dedented_line[len(key)] in ' :':
                         processing[level] = True
                         if level + 1 == len(self.keys):
                             pass  # we don't want the key itself, only its value
                         else:
                             level += 1
            return result.rstrip() + '\n' # remove potential empty lines
        
    yaml = ruamel.yaml.YAML()
    yaml.indent(sequence=4, offset=2)
    data = yaml.load(in_file)
    yaml.dump(data, out_file, transform=SelectKey('root1', 'nested1'))
    print(out_file.read_text(), end='')
    

    which gives:

    <<: *anchor1
    some_list:
      - item1:
          hello: world
      - *anchor2
      - *anchor3
    

    You need to call yaml.indent() to get your non-standard sequence indentation. As the selection is based on the "path" of keys to a value, you won't get just any value for a key nested1.