PyYaml dump non-nested collections based on type

I would like to dump a yaml file from python like this:

Strings:
  - "A very very long string"
  - "A very very long string2"
  - "A very very long string3"
  - "A very very long string4"
  - "A very very long string5"
  - "A very very long string8"
Numbers: [1,2,3,4,5,6,7,8,9]
StringsDict:
  - First: "A very very long string"
  - Second: "A very very long string8"
NumbersDict: {"First": 12, "Second": 156}

Lowest-level collections that contain numbers should be written in a single line such as [1,2,3,4,5,6,7,8,9] or {"First": 12, "Second": 156}, but for strings I want each string to get its own line. Higher-level (nested) collections should always use single lines.

How can I customise my dumper to create this kind of output?

Solution

You can achieve this with custom representers:

import sys, yaml

def represent_list(dumper, data):
    ret = dumper.represent_list(data)
    if all(isinstance(item, str) for item in data):
        ret.flow_style = False
        for item in ret.value: item.style = '"'
    elif all(isinstance(item, int) for item in data):
        ret.flow_style = True
    return ret

def represent_dict(dumper, data):
    ret = dumper.represent_dict(data)
    if all(isinstance(item, str) for item in data.values()):
        ret.flow_style = False
        for item in ret.value: item[1].style = '"'
    elif all(isinstance(item, int) for item in data.values()):
        ret.flow_style = True
    return ret

yaml.add_representer(list, represent_list)
yaml.add_representer(dict, represent_dict)

yaml.dump({
    "Strings": ["a", "b", "c"],
    "Numbers": [1, 2, 3],
    "StringsDict": {"a": "b", "c": "d"},
    "NumbersDict": {"a": 1, "b": 2}
}, sys.stdout)

Output:

Numbers: [1, 2, 3]
NumbersDict: {a: 1, b: 2}
Strings:
- "a"
- "b"
- "c"
StringsDict:
  a: "b"
  c: "d"

This should serve as a starting point, you probably want to expand it (e.g. currently it only checks for int numbers).