Search code examples
pythondictionaryyamlpyyaml

Python dict to YAML text doesn't preserve characters


The question is simple but a bit tricky to phrase correctly.

Basically, I have a dictionary that has the following data:

x = { foo: [1, '\n', 'bar'] }

When I convert to yaml using pyyaml with yaml.safe_dump(x, default_flow_style=False) I expect the output to be:

foo:
  - 1
  - '\n'
  - bar

however, I'm getting something like

foo:
  - 1
  - '

    '
  - bar

The newline character is actually being interpreted instead of being passed as the '\n' string.

I have been looking at the pyyaml documentation but haven't seen the right incantations to have this thing parse correctly.

Has anyone dealt with this same issue before? How did you solve it?


To give more context, about this.

I have a json I want to convert to yaml.

The file that has something like this:

{ 
  "content": {
    "Fn::Join": ["\n", [{ "Ref": "parentStackName" }, ""]]
  }
}

the end result should be this:

content:
  Fn::Join:
    - "\n"
    - - Ref: parentStackId
      - ''

notice the "\n" is just a string there, and not an actual character.

The procedure I'm using is:

  1. Open file
  2. Parse json from text to dict
  3. Use dict to dump to yaml

When I create the dict, you can see the "\n" as part of the string. It's when pyyaml dumps that into yaml that things go awry.


Solution

  • To get the output you want, you can use the round-trip capabilities of ruamel.yaml, and update the flow-style that the JSON subset of YAML uses to block style:

    import sys
    import ruamel.yaml
    from ruamel.yaml.comments import CommentedMap, CommentedSeq
    
    # because this is a string and not read from file, you need to escape 
    # the backslash in \n
    json_str = """\
    {
      "content": {
        "Fn::Join": ["\\n", [{ "Ref": "parentStackName" }, ""]]
      }
    }
    """  
    
    
    def block_style(base):
        """set all mapping and sequneces to block-style"""
        if isinstance(base, CommentedMap):
            for k in base:
                block_style(base[k])
            base.fa.set_block_style()
        if isinstance(base, list):
            for item in base:
                block_style(item)
            base.fa.set_block_style()
        return base
    
    
    data = ruamel.yaml.round_trip_load(json_str)
    block_style(data)
    ruamel.yaml.round_trip_dump(data, sys.stdout)
    

    gives:

    content:
      Fn::Join:
      - "\n"
      - - Ref: parentStackName
        - ''
    

    ruamel.yaml is an updated version of PyYAML (disclaimer: I am the author). It supports the YAML 1.2 spec (from 2009) which brings YAML more in line with being a full superset of JSON and allows you to read your JSON with the ruamel.yaml parser (PyYAML only supports most of the YAML 1.1 spec).

    In "round-trip-mode", the enhancements of ruamel.yaml include maintaining the flow- resp. block-style of the composite nodes (mappings and sequences) on an individual basis (as well as various quoting styles, comments and tag names). What block_style() does is recursively change the "flow-attribute" .fa to be block style for all composite nodes.