Search code examples
pythonyamlpyyamlruamel.yaml

ruamel.yaml: How to preserve structure of dict in YAML


I am using ruamel.yaml to edit YAML files and dump them. I need help on how to keep the structure the same as the original file,

I have a YAML file which has the content below, however, this content is not being modified, but when I load and dump it after editing the structure of this content changes

    parameters: {
      "provision_pipeline": "provision-integrations",
      "enable_sfcd_ds_argo_operator": "false",
      "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
    }

However, after I dump the structure of this is changed to the format below:

    parameters: {"provision_pipeline": "provision-integrations", "enable_sfcd_ds_argo_operator": "false",
      "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"}

Code:

def addTargetToBaseIntegFileAndUpdate(deploymentTarget, fi, env, samvmf_repo, folder, pipelineversionintegration, basefile):
    yamldata = OrderedDict()
    ryaml = rumel.yaml.YAML()
    ryaml.preserve_quotes = True
    ryaml.default_flow_style = False
    ryaml.indent(mapping=2)
        
    with open(basefile, "r") as file:
        yamldata = ryaml.load(file)
        deploymentTargets = yamldata["targets"]["stagger_groups"]
        target = ""
        doesFIExist = False
        fi_index = 0

        for index, sg in enumerate(deploymentTargets):
            if sg["name"] == env.lower():
                target = deploymentTargets[index]
                for i, fi_item in enumerate(target["falcon_instances"]):
                    if fi_item["name"] == fi.lower():
                        fi_index = i
                        doesFIExist = True
                        break
                if doesFIExist:
                    yamldata["targets"]["stagger_groups"][index]["f_instances"][fi_index]["f_domains"].append(deploymentTarget["f_instances"][0]["f_domains"][0])
                else:
                    yamldata["targets"]["stagger_groups"][index]["f_instances"].append(deploymentTarget["f_instances"][0])
                break

    with open(basefile, "w") as fileobj:
        ryaml.dump(yamldata, fileobj)

Solution

  • ruamel.yaml doesn't preserve newlines between flow style mapping elements. The only thing affecting these is yaml.width so you get a wrap on lines that are getting to long. E.g. with your input, if you set the width to 40, you'll get:

    parameters: {"provision_pipeline": "provision-integrations",
      "enable_sfcd_ds_argo_operator": "false",
      "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"}
    

    But there is no control that gets you the first key-value pair on a new line, nor that you get a closing curly brace on a line of its own.

    Your addition ryaml.default_flow_style = False only affects completely new dicts and list that you add to the data structure.

    You should consider switching to block style and drop all non-essential quotes, that makes the YAML both less verbose and more readable. For the program that loads the data this makes no difference, and conversion is easily done by loading in normal safe mode (which does not set block/flow-style information on the loaded data):

    import sys
    import pathlib
    import ruamel.yaml
    
    basefile = pathlib.Path('input.yaml')
    
    data = ruamel.yaml.YAML(typ='safe').load(basefile)
    yaml = ruamel.yaml.YAML()
    yaml.dump(data, sys.stdout)
    

    which gives:

    parameters:
      provision_pipeline: provision-integrations
      enable_sfcd_ds_argo_operator: 'false'
      clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml
    

    The string scalar 'false' needs to get quoted in order not to be confused with the boolean false.

    If the above improvement is unacceptable, e.g. if further processing is done with something else than a full YAML parser, you can post-process the output:

    import sys
    import pathlib
    import ruamel.yaml
    
    basefile = pathlib.Path('input.yaml')
    
    def splitflowmap(s):
        res = []
        for line in s.splitlines():
            if ': {' in line and line[-1] == '}':
                start, rest = line.split(': {', 1)
                start = start + ': {'
                indent = '  '  # two spaces more than the start
                for idx, ch in enumerate(start):
                    if ch != ' ':
                        break
                    indent += ' '
                res.append(start)
                rest = rest[:-1]  # cut of }\n
                for x in rest.split(', '):  # if you always have quotes it is safer to split on '", "'
                    res.append(f'{indent}{x},')
                res[-1] = res[-1][:-1]  # delete trailing comma
                res.append(f'{indent[2:]}}}')  # re-add the cut of }\n on a line of its own
                continue
            res.append(line)
        return '\n'.join(res) + '\n'
    
    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    yaml.width = 2**16
    data = yaml.load(basefile)
    yaml.dump(data, sys.stdout, transform=splitflowmap)
    

    which gives:

    parameters: {
      "provision_pipeline": "provision-integrations",
      "enable_sfcd_ds_argo_operator": "false",
      "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
    }