I have a python project where I'd like to use YAML (pyYaml 3.11), particularly because it is "pretty" and easy for users to edit in a text editor if and when necessary. My problem, though, is if I bring the YAML into a python application (as I will need to) and edit the contents (as I will need to) then writing the new document is typically not quite as pretty as what I started with.
The pyyaml documentation is pretty poor - does not even document the parameters to the dump function. I found http://dpinte.wordpress.com/2008/10/31/pyaml-dump-option/. However, I'm still missing the information I need. (I started to look at the source, but it doesn't seem the most inviting. If I don't get the solution here, then that's my only recourse.)
I start with a document that looks like this:
- color green : inputs : - port thing : widget-hint : filename widget-help : Select a filename - port target_path : widget-hint : path value : 'thing' outputs: - port value: widget-hint : string text : | I'm lost and I'm found and I'm hungry like the wolf.
After loading into python (yaml.safe_load( s )), I try a couple ways of dumping it out:
>>> print yaml.dump( d3, default_flow_style=False, default_style='' ) - color green: inputs: - port thing: widget-help: Select a filename widget-hint: filename - port target_path: value: thing widget-hint: path outputs: - port value: widget-hint: string text: 'I''m lost and I''m found and I''m hungry like the wolf. '
>>> print yaml.dump( d3, default_flow_style=False, default_style='|' ) - "color green": "inputs": - "port thing": "widget-help": |- Select a filename "widget-hint": |- filename - "port target_path": "value": |- thing "widget-hint": |- path "outputs": - "port value": "widget-hint": |- string "text": | I'm lost and I'm found and I'm hungry like the wolf.
Ideally, I would like "short strings" to not use quotes, as in the first result. But I would like multi-line strings to be written as blocks, as with the second result. I guess fundamentally, I'm trying to minimize an explosion of unnecessary quotes in the file which I perceive would make it much more annoying to edit in a text editor.
Does anyone have any experience with this?
If you can use ruamel.yaml (disclaimer: I am the author of this enhanced version of PyYAML) you can
round-trip the original format (YAML document stored in a file org.yaml
):
import sys
import ruamel.yaml
from pathlib import Path
file_org = Path('org.yaml')
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(file_org)
yaml.dump(data, sys.stdout)
which gives:
- color green:
inputs:
- port thing:
widget-hint: filename
widget-help: Select a filename
- port target_path:
widget-hint: path
value: 'thing'
outputs:
- port value:
widget-hint: string
text: |
I'm lost and I'm found
and I'm hungry like the wolf.
Your input is inconsistently indented/formatted, and although there is for more control in ruamel.yaml over the output than in PyYAML, you cannot get your exact original back:
color green :
) have a space before the value indicator (:
) and sometimes you
don't (outputs:
). Apart from special control over root level keys, ruamel.yaml always puts the
value indicator directly adjoint to the key.-
) of zero
(this is the default ruamel.yaml uses). Others are indented five with three offset. ruamel.yaml
cannot format sequences individually/inconstently, I recommend using the default since your root
collection is a sequence.color green
) sometimes
two (e.g. value for key port target_path
). Again ruamel.yaml cannot format these individually/inconstently|
indicator (e.g. using |4
). So this
extra indention will be lostAs you see setting yaml.preserv_quotes
keeps the superfluous quotes around 'thing'
as that is
not what you want, it is not set in the rest of this examples.
The following "normalises" all three examples:
import sys
import ruamel.yaml
from pathlib import Path
LT = ruamel.yaml.scalarstring.LiteralScalarString
file_org = Path('org.yaml')
file_plain = Path('plain.yaml')
file_block = Path('block.yaml')
def normalise(d):
if isinstance(d, dict):
for k, v in d.items():
d[k] = normalise(v)
return d
if isinstance(d, list):
for idx, elem in enumerate(d):
d[idx] = normalise(elem)
return d
if not isinstance(d, str):
return d
if '\n' in d:
if isinstance(d, LT):
return d # already a block style literal scalar
return LT(d)
return str(d)
yaml = ruamel.yaml.YAML()
for fn in [file_org, file_plain, file_block]:
data = normalise(yaml.load(file_org))
yaml.dump(data, fn)
assert file_org.read_bytes() == file_plain.read_bytes()
assert file_org.read_bytes() == file_block.read_bytes()
print(file_block.read_text())
which gives:
- color green:
inputs:
- port thing:
widget-hint: filename
widget-help: Select a filename
- port target_path:
widget-hint: path
value: thing
outputs:
- port value:
widget-hint: string
text: |
I'm lost and I'm found
and I'm hungry like the wolf.
So, as you indicated, you get block style literal scalars if a scalar has newlines, and no block style and no quotes if a scalar it doesn't have a newline.