I have a yaml file as
anchor1: &anchor1
resource_class: small
anchor2: &anchor2
hello: world
anchor3: &anchor3
hello: world
root1:
nested1:
<<: *anchor1
some_list:
- item1:
hello: world
- *anchor2
- *anchor3
nested2:
<<: *anchor1
some_list:
- item1:
hello: world
- *anchor2
- *anchor3
nested3:
<<: *anchor1
some_list:
- item1:
hello: world
- *anchor2
- *anchor3
root2:
nested1:
<<: *anchor1
some_list:
- item1:
hello: world2
- *anchor2
- *anchor3
...
I want to pull out the value of nested1
into a separate file without evaluating all the anchors.
import ruamel.yaml.YAML
yaml = YAML()
with open(Path('in_file')) as f:
data = yaml.load(f)
with open(Path('out_file'), 'w') as f:
yaml.dump(data['root1']['nested1'], f)
The output I want when dumping is
<<: *anchor1
some_list:
- item1:
hello: world
- *anchor2
- *anchor3
I understand it is invalid yaml, as the anchor definitions are not present.
The main problem I run into, is that the moment I grab a value from the root config, it has already been processed.
For example, if I load and dump my in_file
, it works as expected, but if I take the data and get a value out, data['root1']
, it has already processed the anchors.
I suspect that's because the anchor definitions are not part of data['root1']
but I'm not sure how to work around that.
If you are working with files containing YAML documents, use the officially recommended
extension for such files, which
has been .yaml
since at least September 2006.
Then you should consider using pathlib.Path()
instances for files when loading
instead of providing a stream:
data = yaml.load(Path('in_file.yaml')
resp. dumping:
yaml.dump(data, Path('out_file.yaml')
(although that output might be considered not to be a file containing a YAML document).
Your original use of yaml.dump()
is not going to work as you opened the file for reading only, and your
updated version opens the output for 'w', but yaml.dump()
dumps an (UTF-8) binary stream (so use 'wb')
Although it is possible to hook into the representer to skip output until a certain point,
it is much more easy to do the selection post-processing using the transform
parameter
of the dump
method:
import ruamel.yaml
from pathlib import Path
in_file = Path('in_file.yaml')
out_file = Path('out_file.yaml')
class SelectKey:
"""this assumes mappings for all levels of keys, mappings indented by indent spaces"""
def __init__(self, *keys, indent=2):
self.keys = keys
self.indent = indent
def __call__(self, s):
"""
s will contain the full YAML output, process it line by line to find key
"""
processing = [False] * len(self.keys)
result = ""
level = 0
for line in s.splitlines(True):
dedented_line = line[level*self.indent:]
if processing[level]:
if not line.startswith(' ' * (self.indent * level)):
break
dedented_line = dedented_line[self.indent:] # the values
if line and line[0] not in ' \n':
break
result += dedented_line
else:
key = self.keys[level]
if dedented_line.startswith(key) and dedented_line[len(key)] in ' :':
processing[level] = True
if level + 1 == len(self.keys):
pass # we don't want the key itself, only its value
else:
level += 1
return result.rstrip() + '\n' # remove potential empty lines
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
data = yaml.load(in_file)
yaml.dump(data, out_file, transform=SelectKey('root1', 'nested1'))
print(out_file.read_text(), end='')
which gives:
<<: *anchor1
some_list:
- item1:
hello: world
- *anchor2
- *anchor3
You need to call yaml.indent()
to get your non-standard sequence indentation. As
the selection is based on the "path" of keys to a value, you won't get
just any value for a key nested1
.