Search code examples
pythonyamlpyyamlruamel.yaml

Python YAML dump special character and multiple lines


I have a my_yaml.yml file with the following content:

my_yaml:
  person: >
    John|Doe|48,
    Jack|Black|39
  skills:
    - name: superhero
      abilities:
        - swim
        - run
  special_chars:
    - '! | " "'
    - '+ | " "'
    - '\ | " "'
    - 'Á | "A"'
    - 'É | "E"'
    - 'Ű | "U"'
    - 'Û | "U"'

I want to load it then dump into a my_yaml_new.yml file having totally the same format & characters as the original input file has. My code is:

import yaml
my_yaml = yaml.load(open('my_yaml.yml', encoding='utf8'))  # without "utf8" encoding I get "'charmap' codec can't decode byte..." error

I can dump it into console but 1) order of abilities & name has changed :(

yaml.dump(my_yaml, default_flow_style=False, allow_unicode=True)

Result is:

'my_yaml:\n  person: >\n    John|Doe|48, Jack|Black|39\n  skills:\n  - abilities:\n    - swim\n    - run\n    name: superhero\n  special_chars:\n  - \'! | " "\'\n  - + | " "\n  - \\ | " "\n  - Á | "A"\n  - É | "E"\n  - Ű | "U"\n  - Û | "U"\n'

And when I try to dump into a file:

with open('my_yaml_new.yml', 'w') as outfile:
    yaml.dump(my_yaml, outfile, default_flow_style=False, allow_unicode=True)

2) I get the following error due to character Û:

UnicodeEncodeError: 'charmap' codec can't encode character '\xdb' in position 0: character maps to undefined

If I delete this line from the input my_yaml.yml file then above dump is successful, but 3) my multiple lines at person string go into one line :(

my_yaml:
  person: >
    John|Doe|48, Jack|Black|39
  skills:
  - abilities:
    - swim
    - run
    name: superhero
  special_chars:
  - '! | " "'
  - + | " "
  - \ | " "
  - Á | "A"
  - É | "E"
  - Ű | "U"

4) And also my single quotes (') are disappeared from special_chars :(

5) And also note that elements of skills has no indentation :(

I've tried these solutions with no success. And nor import ruamel.yaml as yaml has helped.


UPDATE

OK, the following great package solves problems 1) & 4), and I can replace > to | at multi line values so 3) is also solved. And maybe 5) is not a huge problem. But I still struggle with special characters like Û or Ǘ so I'm still looking for solution for problem 2)...

from ruamel import yaml

    my_yaml = yaml.round_trip_load(open('dmy_yaml.yml', encoding='utf8'), preserve_quotes=True)
    with open('my_yaml_new.yml', 'w') as outfile:
        yaml.round_trip_dump(my_yaml, outfile, default_flow_style=False, allow_unicode=True)

Solution

  • I am not sure why you encounter problems with unicode. If you have your my_yaml.yml and a program try.py:

    import sys
    import ruamel.yaml
    
    with open('my_yaml.yml') as fp:
        yaml_str = fp.read().replace(': >\n', ': |\n')
    
    yaml = ruamel.yaml.YAML()
    yaml.indent(mapping=2, sequence=4, offset=2)
    yaml.preserve_quotes = True
    data = yaml.load(yaml_str)
    new_file = 'my_yaml_new.yml'
    with open(new_file, 'w') as ofp:
        yaml.dump(data, ofp)
    

    then that produces:

    my_yaml:
      person: |
        John|Doe|48,
        Jack|Black|39
      skills:
        - name: superhero
          abilities:
            - swim
            - run
      special_chars:
        - '! | " "'
        - '+ | " "'
        - '\ | " "'
        - 'Á | "A"'
        - 'É | "E"'
        - 'Ű | "U"'
        - 'Û | "U"'
    

    in a virtualenv for both Python2 and Python3 with ruamel.yaml 0.15.40.

    I used:

    for n in 2 3 ; do  mktmpenv -p /opt/python/$n/bin/python -qq -i ruamel.yaml; python --version; python try.py; deactivate; done
    

    which of course relies on (the latest) versions of Python 2 and 3 to be installed under /opt/python/2 resp. /opt/python/3 (which they are on my Linux development system).

    Note that the Unicode shows no problems, that the yaml.indent(mapping=2, sequence=4, offset=2) preserves your source indentation, but that you still need to change the folded multi-line scalar to literal style (which I do while reading into yaml_str) as ruamel.yaml doesn't support preserving that (primarily because there is no easy way to indicate the original folding points in a transparent way).