Search code examples
pythonpython-3.xyamlpyyaml

Is it possible to preserve YAML block structure when dumping a parsed document?


We use PyYAML to prep config files for different environments. But our YAML blocks lose integrity.

Give input.yml ...

pubkey: |
    -----BEGIN PUBLIC KEY-----
    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
    EsUgJHXcpw7OPxRrCQIDAQAB
    -----END PUBLIC KEY-----

... executing this program using python3 ...

import yaml

with open('input.yml', mode='r') as f:
    parsed = yaml.safe_load(f)

with open('output.yml', mode='w') as f:
    yaml.dump(parsed, f)

... produces this output.yml ...

pubkey: '-----BEGIN PUBLIC KEY-----

    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq

    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2

    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK

    EsUgJHXcpw7OPxRrCQIDAQAB

    -----END PUBLIC KEY-----

    '

Is it possible to preserve the structure of my block using PyYAML?


Solution

  • Yes that is possible with pyyaml, but you do have to provide your own enhanced versions of at least the Scanner, Parser and Constructor that are used by safe_load, the Emitter, Serializer and Representer used by dump, and by providing a specialized string-like class that keeps information about it's original formatting.

    This is part of what was added to ruamel.yaml (disclaimer: I am the author of that package) as it was derived from PyYAML. Using ruamel.yaml the prefefred way of doing this is:

    import sys
    import ruamel.yaml
    
    yaml_str = """\
    pubkey: |
        -----BEGIN PUBLIC KEY-----
        MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
        QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
        UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
        EsUgJHXcpw7OPxRrCQIDAQAB
        -----END PUBLIC KEY-----
    """
    yaml = ruamel.yaml.YAML()  # defaults to round-trip
    yaml.indent(mapping=4)
    data = yaml.load(yaml_str)
    yaml.dump(data, sys.stdout)
    

    Or the older more PyYAML like style (which has some restrictions in options that you can set)

    import sys
    import ruamel.yaml as yaml
    
    yaml_str = """\
    pubkey: |
        -----BEGIN PUBLIC KEY-----
        MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
        QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
        UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
        EsUgJHXcpw7OPxRrCQIDAQAB
        -----END PUBLIC KEY-----
    """
    
    data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
    yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper, indent=4)
    

    Both of which give you:

    pubkey: |
        -----BEGIN PUBLIC KEY-----
        MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
        QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
        UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
        EsUgJHXcpw7OPxRrCQIDAQAB
        -----END PUBLIC KEY-----
    

    at least with Python 2.7 and 3.5+.

    The indent=4 is necessary as the RoundTripDumper defaults to two spaces indent and the original indent of a file is not preserved (not doing so eases re-indenting a YAML file).

    If you cannot switch to ruamel.yaml you should be able to use its source to extract all the changes needed, but if you can you can also use its other features like comment and merge key name preservation.