I have a my_yaml.yml
file with the following content:
my_yaml:
person: >
John|Doe|48,
Jack|Black|39
skills:
- name: superhero
abilities:
- swim
- run
special_chars:
- '! | " "'
- '+ | " "'
- '\ | " "'
- 'Á | "A"'
- 'É | "E"'
- 'Ű | "U"'
- 'Û | "U"'
I want to load it then dump into a my_yaml_new.yml
file having totally the same format & characters as the original input file has. My code is:
import yaml
my_yaml = yaml.load(open('my_yaml.yml', encoding='utf8')) # without "utf8" encoding I get "'charmap' codec can't decode byte..." error
I can dump
it into console but 1) order of abilities
& name
has changed :(
yaml.dump(my_yaml, default_flow_style=False, allow_unicode=True)
Result is:
'my_yaml:\n person: >\n John|Doe|48, Jack|Black|39\n skills:\n - abilities:\n - swim\n - run\n name: superhero\n special_chars:\n - \'! | " "\'\n - + | " "\n - \\ | " "\n - Á | "A"\n - É | "E"\n - Ű | "U"\n - Û | "U"\n'
And when I try to dump into a file:
with open('my_yaml_new.yml', 'w') as outfile:
yaml.dump(my_yaml, outfile, default_flow_style=False, allow_unicode=True)
2) I get the following error due to character Û
:
UnicodeEncodeError: 'charmap' codec can't encode character '\xdb' in position 0: character maps to undefined
If I delete this line from the input my_yaml.yml
file then above dump is successful, but 3) my multiple lines at person
string go into one line :(
my_yaml:
person: >
John|Doe|48, Jack|Black|39
skills:
- abilities:
- swim
- run
name: superhero
special_chars:
- '! | " "'
- + | " "
- \ | " "
- Á | "A"
- É | "E"
- Ű | "U"
4) And also my single quotes (') are disappeared from special_chars
:(
5) And also note that elements of skills
has no indentation :(
I've tried these solutions with no success. And nor import ruamel.yaml as yaml
has helped.
UPDATE
OK, the following great package solves problems 1) & 4), and I can replace >
to |
at multi line values so 3) is also solved. And maybe 5) is not a huge problem. But I still struggle with special characters like Û
or Ǘ
so I'm still looking for solution for problem 2)...
from ruamel import yaml
my_yaml = yaml.round_trip_load(open('dmy_yaml.yml', encoding='utf8'), preserve_quotes=True)
with open('my_yaml_new.yml', 'w') as outfile:
yaml.round_trip_dump(my_yaml, outfile, default_flow_style=False, allow_unicode=True)
I am not sure why you encounter problems with unicode. If you have your my_yaml.yml
and a program try.py
:
import sys
import ruamel.yaml
with open('my_yaml.yml') as fp:
yaml_str = fp.read().replace(': >\n', ': |\n')
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
new_file = 'my_yaml_new.yml'
with open(new_file, 'w') as ofp:
yaml.dump(data, ofp)
then that produces:
my_yaml:
person: |
John|Doe|48,
Jack|Black|39
skills:
- name: superhero
abilities:
- swim
- run
special_chars:
- '! | " "'
- '+ | " "'
- '\ | " "'
- 'Á | "A"'
- 'É | "E"'
- 'Ű | "U"'
- 'Û | "U"'
in a virtualenv for both Python2 and Python3 with ruamel.yaml 0.15.40.
I used:
for n in 2 3 ; do mktmpenv -p /opt/python/$n/bin/python -qq -i ruamel.yaml; python --version; python try.py; deactivate; done
which of course relies on (the latest) versions of Python 2 and 3 to be installed under /opt/python/2
resp. /opt/python/3
(which they are on my Linux development system).
Note that the Unicode shows no problems, that the yaml.indent(mapping=2, sequence=4, offset=2)
preserves your source indentation, but that you still need to change the folded multi-line scalar to literal style (which I do while reading into yaml_str
) as ruamel.yaml doesn't support preserving that (primarily because there is no easy way to indicate the original folding points in a transparent way).