Search code examples
python-3.xyamlpyyamlruamel.yaml

How can I parse YAML into multiple compose.yaml based on the value of a key with python


I'm parsing YAML and break it into multidifferent YAML file. I use constructor of PyYAML to achieve it, but the effect is poor.

This is a part of my project, I need to parse and split into multiple different yaml files based on the value of a key in a yaml file I receive.

yaml file I receive looks like this

testname: testname
testall:
    test1:
        name: name1
        location: 0
    test2: 
        name: name2
        location: 2
    test3: 
        name: name3
        location: 0
    test4: 
        name: name4
        location: 2
    ...
locations:
    - 0
    - 2
    - ...  

I want to parse it and split by device like the following:

# location0.yaml
testname:test
tests:
    test1:
        name:test1
        location:0
    test3: 
        name: test3
        location: 0
# location2.yaml
testname:test
tests:
    test2:
        name:test2
        location:0
    test4: 
        name: test4
        location: 0

How to parse like above form?


Solution

  • Although you can do this with PyYAML, you would have to restrict yourself to YAML 1.1. For this kind of read-modify-write you should use ruamel.yaml (disclaimer: I am the author of that package). Not only does that support YAML 1.2, it also preserves any comments, tags and anchor names in case they occur in your source and can preserve quotes around scalars, literal and folded style, etc. if you need that.

    Also note that your output is invalid YAML, you cannot have a multi-line plain (i.e. unquoted) scalar be the key of (block style) mapping. You would have to write:

    "testname:test
    tests":
    

    but I assume you meant that to be two keys for the root level mapping:

    testname: test
    tests:
    

    Assuming your input is in input.yaml:

    testname: testname
    testall:
        test1:
            name: name1    # this is just the first name
            location: 0
        test2: 
            name: "name2"  # quotes added for demo purposes
            location: 2
        test3: 
            name: name3    # as this has the same location as name1 
            location: 0    # these should be together
        test4: 
            name: name4    # and this one goes with name2
            location: 2
    locations:
        - 0
        - 2
    

    you can do:

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    in_file = Path('input.yaml')
    
    
    yaml = ruamel.yaml.YAML()
    yaml.indent(mapping=4, sequence=6, offset=4)  # this matches your input
    yaml.preserve_quotes = True
    data = yaml.load(in_file)
    
    for loc in data['locations']:
        out_name = f'location{loc}.yaml'
        tests = {}
        d = ruamel.yaml.comments.CommentedMap(dict(testname="test", tests=tests))
        d.yaml_set_start_comment(out_name)
        testall = data['testall']
        for test in testall:
            if loc == testall[test]['location']:
               tests[test] = testall[test]
               tests[test]['location'] = 0
        # since you set location to zero and this affects data, make sure to remove 
        # the items. This will prevent things from going wrong in case the
        # locations sequence does have zero, but not as its first number
        for key in tests:
             del testall[key]
        yaml.dump(d, Path(out_name))
    

    which gives location0.yaml:

    # location0.yaml
    testname: test
    tests:
        test1:
            name: name1    # this is just the first name
            location: 0
        test3:
            name: name3    # as this has the same location as name1 
            location: 0    # these should be together
    

    and location2.yaml:

    # location2.yaml
    testname: test
    tests:
        test2:
            name: "name2"  # quotes added for demo purposes
            location: 0
        test4:
            name: name4    # and this one goes with name2
            location: 0