Search code examples
pythonruamel.yaml

Use literal style for just multiline strings in ruamel.yaml


I would like to have a custom ruamel.yaml dumper that uses Literal style for all multiline strings and the default style otherwise. For example:

import sys
import ruamel.yaml

data = {"a": "hello", "b": "hello\nthere\nworld"}

print("Default style")
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

print()
print("style='|'")
yaml = ruamel.yaml.YAML()
yaml.default_style = "|"
yaml.dump(data, sys.stdout)

This produces:

Default style
a: hello
b: "hello\nthere\nworld"

style='|'
"a": |-
  hello
"b": |-
  hello
  there
  world

My desired output is:

a: hello
b: |-
  hello
  there
  world

Solution

  • There are multiple ways to achieve what you want. If you have control over building up the data structure, it is often easiest to add a LiteralScalarString if appropriate:

    import sys
    import ruamel.yaml
    
    def lim(s):  # literal if multi-line
        if '\n' in s:
            return ruamel.yaml.scalarstring.LiteralScalarString(s)
        return s
    
    data = {'a': lim('hello'), 'b': lim('hello\nthere\nworld')}
    
        
    yaml = ruamel.yaml.YAML()
    yaml.dump(data, sys.stdout)
    

    which gives:

    a: hello
    b: |-
      hello
      there
      world
    

    This gives you easy fine control over what gets dumped as literal style.

    If you don't add all the data individually (but e.g. read them from a JSON file), you can walk over your data structure after it is fully constructed and update it in place:

    import sys
    import ruamel.yaml
    
    def tmtl(d):
        """translate multi-line to literal,
           only acts on dict values and sequence items, not on keys
        """
        if isinstance(d, dict):
            for k, v in d.items():
                if isinstance(v, str) and '\n' in v:
                    d[k] = ruamel.yaml.scalarstring.LiteralScalarString(v)
                else:
                    tmtl(v)
        elif isinstance(d, list):
            for idx, item in enumerate(d):
                if isinstance(item, str) and '\n' in item:
                    d[idx] = ruamel.yaml.scalarstring.LiteralScalarString(item)
    
    data = {'a': 'hello', 'b': 'hello\nthere\nworld'}
    tmtl(data)
        
    yaml = ruamel.yaml.YAML()
    yaml.dump(data, sys.stdout)
    

    which gives:

    a: hello
    b: |-
      hello
      there
      world
    

    If you cannot update your data, you could rewrite tmtl in the program above so it builds a new data structure and returns that, but at that point it is IMO easier to change the representer:

    import sys
    import ruamel.yaml
    
    CKS = ruamel.yaml.comments.CommentedKeySeq  # so you can have sequences as keys in a mapping
    
    class MyRepresenter(ruamel.yaml.representer.RoundTripRepresenter):
        def represent_str(self, s):
            if '\n' in s:
                return self.represent_scalar('tag:yaml.org,2002:str', s, style='|')
            return self.represent_scalar('tag:yaml.org,2002:str', s)
    
    MyRepresenter.add_representer(str, MyRepresenter.represent_str)
    
    data = {'a': 'hello', 'b': 'hello\nthere\nworld', CKS((1, 2)): ['nested works\nas well\n\n']}
        
    yaml = ruamel.yaml.YAML()
    yaml.Representer = MyRepresenter
    yaml.dump(data, sys.stdout)
    

    which gives:

    a: hello
    b: |-
      hello
      there
      world
    [1, 2]:
    - |+
      nested works
      as well
    
    ...
    

    As you can see the trailing newlines of the final literal style scalar automatically causes the chomping indicator to change from strip (-) to keep (+) and the explicit document end marker (...) to appear.