I'm parsing YAML and break it into multidifferent YAML file. I use constructor of PyYAML to achieve it, but the effect is poor.
This is a part of my project, I need to parse and split into multiple different yaml files based on the value of a key in a yaml file I receive.
yaml file I receive looks like this
testname: testname
testall:
test1:
name: name1
location: 0
test2:
name: name2
location: 2
test3:
name: name3
location: 0
test4:
name: name4
location: 2
...
locations:
- 0
- 2
- ...
I want to parse it and split by device like the following:
# location0.yaml
testname:test
tests:
test1:
name:test1
location:0
test3:
name: test3
location: 0
# location2.yaml
testname:test
tests:
test2:
name:test2
location:0
test4:
name: test4
location: 0
How to parse like above form?
Although you can do this with PyYAML, you would have to restrict
yourself to YAML 1.1. For this kind of read-modify-write you should
use ruamel.yaml
(disclaimer: I am the author of that package). Not
only does that support YAML 1.2, it also preserves any comments, tags
and anchor names in case they occur in your source and can preserve
quotes around scalars, literal and folded style, etc. if you need that.
Also note that your output is invalid YAML, you cannot have a multi-line plain (i.e. unquoted) scalar be the key of (block style) mapping. You would have to write:
"testname:test
tests":
but I assume you meant that to be two keys for the root level mapping:
testname: test
tests:
Assuming your input is in input.yaml
:
testname: testname
testall:
test1:
name: name1 # this is just the first name
location: 0
test2:
name: "name2" # quotes added for demo purposes
location: 2
test3:
name: name3 # as this has the same location as name1
location: 0 # these should be together
test4:
name: name4 # and this one goes with name2
location: 2
locations:
- 0
- 2
you can do:
import sys
from pathlib import Path
import ruamel.yaml
in_file = Path('input.yaml')
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=4) # this matches your input
yaml.preserve_quotes = True
data = yaml.load(in_file)
for loc in data['locations']:
out_name = f'location{loc}.yaml'
tests = {}
d = ruamel.yaml.comments.CommentedMap(dict(testname="test", tests=tests))
d.yaml_set_start_comment(out_name)
testall = data['testall']
for test in testall:
if loc == testall[test]['location']:
tests[test] = testall[test]
tests[test]['location'] = 0
# since you set location to zero and this affects data, make sure to remove
# the items. This will prevent things from going wrong in case the
# locations sequence does have zero, but not as its first number
for key in tests:
del testall[key]
yaml.dump(d, Path(out_name))
which gives location0.yaml
:
# location0.yaml
testname: test
tests:
test1:
name: name1 # this is just the first name
location: 0
test3:
name: name3 # as this has the same location as name1
location: 0 # these should be together
and location2.yaml
:
# location2.yaml
testname: test
tests:
test2:
name: "name2" # quotes added for demo purposes
location: 0
test4:
name: name4 # and this one goes with name2
location: 0