Search code examples
pythonyamlruamel.yaml

How to reorder yaml keys and maintain comments according to a predefined template?


I would like to set a template for the key order and line spacing of a yaml file and apply this to the repository of 100s of yaml files I have. In general, I want like to do the following:

  1. Load the existing yaml file
  2. Reorder the keys and values according to the template
  3. Delete any comments that are just a new line
  4. Apply the line spacing from the template
  5. Save the yaml file

I am using python 3.10 and ruamel.yaml version. At a very basic level, I understand that the YAML object in ruamel.yaml is based upon an ordered dictionary and the accepted answer here seems like a simple way to ensure a specific order of a dictionary's keys, but I don't know how to apply that to the YAML object.

To maintain comments, I presume that the .ca attribute can be copied although I don't know how to then apply the line spacing rules from the template.

Further complicating the matter is that some keys themselves may have multiple values (I think these would be a CommentedSequence in ruamel.yaml ?) each of which should follow the templated order - and the last one will need a blank line after it.

Here is a basic version of the template that should provide an overview of the structure I'm talking about:

template='''
name:
region:
origin:
description:

go_live_date:
status:

governance:
  business_owner:
    am:
    eu:
    ap:
  technical_owner:
    am:
    eu:
    ap:

architecture:
  protocol:
  platform:

environments:
  - name:
    description:
    tier:
    locations:

'''

In the following example, the key order is wrong, there are missing and double line spacing plus some comments:

'''
name: MyApp
description: My wonderful application
origin: internal
governance:
  technical_owner:
    am:
      - Nico Ferrell
    ap:
      - Benedict Berger
      - Elsie Parsons
    eu:
      - Frances Case

  business_owner:
    eu:
      - Audrey Dalton
    am:
      - John Carpenter # to be updated

architecture:
  protocol: [TCP]
  platforms: [python_3_10, java_16]


status: in production
go_live_date: 2024-01-01
environments:
  - name: EU Prod
    description: production environment for EMEA
    tier: production
    locations: [ABC, XYZ]
  - name: EU UAT
    description: UAT environment for EMEA
    locations: [LMN]
    tier: uat
# further environmental details to be added
'''

After applying the template and steps outlined to this example, the resultant file should look like this:

'''
name: MyApp
origin: internal
description: My wonderful application

status: in production
go_live_date: 2024-01-01

governance:
  technical_owner:
    am:
      - Nico Ferrell
    eu:
      - Frances Case
    ap:
      - Benedict Berger
      - Elsie Parsons

  business_owner:
    am:
      - John Carpenter # to be updated
    eu:
      - Audrey Dalton

architecture:
  protocol: [TCP]
  platforms: [python_3_10, java_16]

environments:
  - name: EU Prod
    description: production environment for EMEA
    tier: production
    locations: [ABC, XYZ]
  - name: EU UAT
    description: UAT environment for EMEA
    tier: uat
    locations: [LMN]
# further environmental details to be added

'''

I don't know how to tackle this and would appreciate some help


Solution

  • You tackle this by writing a program, using order of the keys to re-insert the keys of the example in the order of the template. You can either use the .insert() methode that is available on the CommentedMap() instance that is used to load a YAML mapping inserting at position 0 using the reverse key order from the template. But you can also use the normal key order and pop and assign, that will get the first key at the back, then followed by the others until the first key is at the front.

    To execute that, you can use a function that keeps a path to find corresponding the corresponding data strucure in the example, or recurse in parallel.

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    yaml = ruamel.yaml.YAML()
    yaml.indent(mapping=2, sequence=4, offset=2)
    yaml.preserve_quotes = True
    template = yaml.load(Path('template.yaml'))
    data = yaml.load(Path('example.yaml'))
    
    def reorder(t, d):
        if isinstance(t, dict):
            for k, v in t.items():
                try:
                    dv = d.pop(k)
                except:
                    # this handles e.g. the key 'region' that is missing from the example
                    continue  
                d[k] = dv
                reorder(v, dv)
        elif isinstance(t, list):
            # assume the template has one element, the example multiple
            for idx, elem in enumerate(d):
                reorder(t[0], elem)
    
    reorder(template, data)
    yaml.dump(data, sys.stdout)
    

    which gives:

    name: MyApp
    origin: internal
    description: My wonderful application
    go_live_date: 2024-01-01
    
    
    status: in production
    governance:
      business_owner:
        am:
          - John Carpenter # to be updated
    
        eu:
          - Audrey Dalton
      technical_owner:
        am:
          - Nico Ferrell
        eu:
          - Frances Case
    
        ap:
          - Benedict Berger
          - Elsie Parsons
    architecture:
      platforms: [python_3_10, java_16]
      protocol: [TCP]
    environments:
      - name: EU Prod
        description: production environment for EMEA
        tier: production
        locations: [ABC, XYZ]
      - name: EU UAT
        description: UAT environment for EMEA
        tier: uat
    # further environmental details to be added
        locations: [LMN]
    

    This gets your keys in the order of the template, but doesn't handle the empty lines. That is on purpose as we only recurse into the template data structure and the newline after "John Carpenter" is part of the sequence that is not part of the template. (As you can check with print(data['governance']['business_owner']['am'].ca)) Because of the way ruamel.yaml currently processes comments, attaching them to the last fully parsed node, the comment # further.. is assoicated with the key 'tier', and properly shifts position with reordering (although that might not be what you want).

    Since ruamel.yaml was concieved to update values in existing YAML (config) files preserving as much as possible (key order, comments, empty lines) and you are certainly not doing anything close to that, you'll have some work doing the other steps.

    I would first walk over the resuling example data an print the comments you find:

    def remove_empty_lines(d):
        if isinstance(d, dict):
            for k, v in d.items():
                if d.ca.comment:
                    print('comment', d.ca.comment)
                if (itemc := d.ca.items.get(k)) is not None:
                    print('itemc', v, itemc)
                remove_empty_lines(v)
        elif isinstance(d, list):
            for idx, elem in enumerate(d):
                if d.ca.comment:
                    print('lcomment', d.ca.comment)
                if (itemc := d.ca.items.get(idx)) is not None:
                    print('litemc', elem, itemc)
                remove_empty_lines(elem)
    
    remove_empty_lines(data)
    

    which gives:

    itemc in production [None, [CommentToken('\n\n', line: 22, col: 0)], None, None]
    litemc John Carpenter [CommentToken('# to be updated\n\n', line: 17, col: 23), None, None, None]
    litemc Frances Case [CommentToken('\n\n', line: 11, col: 8), None, None, None]
    itemc uat [None, None, CommentToken('\n# further environmental details to be added\n', line: 35, col: 0), None]
    

    So you will need to inspect those items and update the CommentToken. E.g. by using

    print(dir(data['governance']['business_owner']['am'].ca.items[0][0]))
    print(data['governance']['business_owner']['am'].ca.items[0][0].value)
    

    you'll see that the the .value attribute contains the actual comment, that you can strip of spurious newlines.

    Once that is done, walk over both template and data once more, check the template for comments, and insert/update the example. Make sure to create new CommentTokens do not copy them from the template. Examples for that you can find here