Search code examples
pythonyamlpretty-printpyyaml

pretty output with pyyaml


I have a python project where I'd like to use YAML (pyYaml 3.11), particularly because it is "pretty" and easy for users to edit in a text editor if and when necessary. My problem, though, is if I bring the YAML into a python application (as I will need to) and edit the contents (as I will need to) then writing the new document is typically not quite as pretty as what I started with.

The pyyaml documentation is pretty poor - does not even document the parameters to the dump function. I found http://dpinte.wordpress.com/2008/10/31/pyaml-dump-option/. However, I'm still missing the information I need. (I started to look at the source, but it doesn't seem the most inviting. If I don't get the solution here, then that's my only recourse.)

I start with a document that looks like this:

- color green :
     inputs :
        - port thing :
            widget-hint : filename
            widget-help : Select a filename
        - port target_path : 
            widget-hint : path
            value : 'thing' 
     outputs:
        - port value:
             widget-hint : string
     text : |
            I'm lost and I'm found
            and I'm hungry like the wolf.

After loading into python (yaml.safe_load( s )), I try a couple ways of dumping it out:

>>> print yaml.dump( d3, default_flow_style=False, default_style='' )
- color green:
    inputs:
    - port thing:
        widget-help: Select a filename
        widget-hint: filename
    - port target_path:
        value: thing
        widget-hint: path
    outputs:
    - port value:
        widget-hint: string
    text: 'I''m lost and I''m found

      and I''m hungry like the wolf.

      '
>>> print yaml.dump( d3, default_flow_style=False, default_style='|' )
- "color green":
    "inputs":
    - "port thing":
        "widget-help": |-
          Select a filename
        "widget-hint": |-
          filename
    - "port target_path":
        "value": |-
          thing
        "widget-hint": |-
          path
    "outputs":
    - "port value":
        "widget-hint": |-
          string
    "text": |
      I'm lost and I'm found
      and I'm hungry like the wolf.

Ideally, I would like "short strings" to not use quotes, as in the first result. But I would like multi-line strings to be written as blocks, as with the second result. I guess fundamentally, I'm trying to minimize an explosion of unnecessary quotes in the file which I perceive would make it much more annoying to edit in a text editor.

Does anyone have any experience with this?


Solution

  • If you can use ruamel.yaml (disclaimer: I am the author of this enhanced version of PyYAML) you can round-trip the original format (YAML document stored in a file org.yaml):

    import sys
    import ruamel.yaml
    from pathlib import Path
    
    file_org = Path('org.yaml')
        
    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    data = yaml.load(file_org)
    yaml.dump(data, sys.stdout)
    

    which gives:

    - color green:
        inputs:
        - port thing:
            widget-hint: filename
            widget-help: Select a filename
        - port target_path:
            widget-hint: path
            value: 'thing'
        outputs:
        - port value:
            widget-hint: string
        text: |
          I'm lost and I'm found
          and I'm hungry like the wolf.
    

    Your input is inconsistently indented/formatted, and although there is for more control in ruamel.yaml over the output than in PyYAML, you cannot get your exact original back:

    • you sometimes (color green :) have a space before the value indicator (:) and sometimes you don't (outputs:). Apart from special control over root level keys, ruamel.yaml always puts the value indicator directly adjoint to the key.
    • your root level sequence is indented two columns with offset for the block sequence indicator (-) of zero (this is the default ruamel.yaml uses). Others are indented five with three offset. ruamel.yaml cannot format sequences individually/inconstently, I recommend using the default since your root collection is a sequence.
    • your mappings are sometimes indented three columns (value for key color green) sometimes two (e.g. value for key port target_path). Again ruamel.yaml cannot format these individually/inconstently
    • Your block style literal scalar is indented more than the standard two spaces if you don't append a block indentation indicator to the | indicator (e.g. using |4). So this extra indention will be lost

    As you see setting yaml.preserv_quotes keeps the superfluous quotes around 'thing' as that is not what you want, it is not set in the rest of this examples.

    The following "normalises" all three examples:

    import sys
    import ruamel.yaml
    from pathlib import Path
    LT = ruamel.yaml.scalarstring.LiteralScalarString
    
    file_org = Path('org.yaml')
    file_plain = Path('plain.yaml')
    file_block = Path('block.yaml')
    
    def normalise(d):
        if isinstance(d, dict):
            for k, v in d.items():
                 d[k] = normalise(v)
            return d
        if isinstance(d, list):
            for idx, elem in enumerate(d):
                d[idx] = normalise(elem)
            return d
        if not isinstance(d, str):
            return d
        if '\n' in d:
            if isinstance(d, LT):
                return d     # already a block style literal scalar
            return LT(d)
        return str(d)
    
    yaml = ruamel.yaml.YAML()
    for fn in [file_org, file_plain, file_block]:
        data = normalise(yaml.load(file_org))
        yaml.dump(data, fn)
    
    assert file_org.read_bytes() == file_plain.read_bytes()
    assert file_org.read_bytes() == file_block.read_bytes()
    print(file_block.read_text())
    

    which gives:

    - color green:
        inputs:
        - port thing:
            widget-hint: filename
            widget-help: Select a filename
        - port target_path:
            widget-hint: path
            value: thing
        outputs:
        - port value:
            widget-hint: string
        text: |
          I'm lost and I'm found
          and I'm hungry like the wolf.
    

    So, as you indicated, you get block style literal scalars if a scalar has newlines, and no block style and no quotes if a scalar it doesn't have a newline.