Search code examples
yamlpyyaml

PyYaml dump non-nested collections based on type


I would like to dump a yaml file from python like this:

Strings:
  - "A very very long string"
  - "A very very long string2"
  - "A very very long string3"
  - "A very very long string4"
  - "A very very long string5"
  - "A very very long string8"
Numbers: [1,2,3,4,5,6,7,8,9]
StringsDict:
  - First: "A very very long string"
  - Second: "A very very long string8"
NumbersDict: {"First": 12, "Second": 156}

Lowest-level collections that contain numbers should be written in a single line such as [1,2,3,4,5,6,7,8,9] or {"First": 12, "Second": 156}, but for strings I want each string to get its own line. Higher-level (nested) collections should always use single lines.

How can I customise my dumper to create this kind of output?


Solution

  • You can achieve this with custom representers:

    import sys, yaml
    
    def represent_list(dumper, data):
        ret = dumper.represent_list(data)
        if all(isinstance(item, str) for item in data):
            ret.flow_style = False
            for item in ret.value: item.style = '"'
        elif all(isinstance(item, int) for item in data):
            ret.flow_style = True
        return ret
    
    def represent_dict(dumper, data):
        ret = dumper.represent_dict(data)
        if all(isinstance(item, str) for item in data.values()):
            ret.flow_style = False
            for item in ret.value: item[1].style = '"'
        elif all(isinstance(item, int) for item in data.values()):
            ret.flow_style = True
        return ret
    
    yaml.add_representer(list, represent_list)
    yaml.add_representer(dict, represent_dict)
    
    yaml.dump({
        "Strings": ["a", "b", "c"],
        "Numbers": [1, 2, 3],
        "StringsDict": {"a": "b", "c": "d"},
        "NumbersDict": {"a": 1, "b": 2}
    }, sys.stdout)
    

    Output:

    Numbers: [1, 2, 3]
    NumbersDict: {a: 1, b: 2}
    Strings:
    - "a"
    - "b"
    - "c"
    StringsDict:
      a: "b"
      c: "d"
    

    This should serve as a starting point, you probably want to expand it (e.g. currently it only checks for int numbers).