Search code examples
python-3.xyamlpyyaml

How to safe_dump the dictionary and list into YAML?


I want the output as the YAML below:

 - item: Food_eat
   Food:
     itemId: 42536216
     category: fruit
     moreInfo:
       - "organic"

I have used the following code to print in the same order as above but output is coming not as expected.

Code:

import yaml

yaml_result = [{'item': 'Food_eat', 'Food': {'foodNo': 42536216,'type': 'fruit','moreInfo': ['organic']}}]

print(yaml.safe_dump(yaml_result))
print(yaml_test)

Output:

- Food:
    moreInfo:
    - organic
    category: fruit
    itemId: 42536216
  item: Food_eat

Not sure how to get the desired output.


Solution

  • ruamel.yaml (disclaimer: I am the author of that package) does have this feature built-in, as it is necessary to support its capability to round-trip (load, modify, dump) YAML data without introducing spurious changes. Apart from that it defaults to YAML 1.2, whereas PyYAML only supports YAML 1.1 (outdated more than 10 years ago).

    import sys
    import ruamel.yaml
    
    data = [{'item': 'Food_eat', 'Food': {'foodNo': 42536216,'type': 'fruit','moreInfo': ['organic']}}]
    
    yaml = ruamel.yaml.YAML()
    yaml.indent(sequence=4, offset=2)
    yaml.dump(data, sys.stdout)
    

    which gives:

      - item: Food_eat
        Food:
          foodNo: 42536216
          type: fruit
          moreInfo:
            - organic
    

    This relies on a modern Python's ability to keep the insertion ordering of a dict. For older versions, like Python 2.7, you'll have to explicitly make an object CommentedMap (as imported from ruamel.yaml.comments and either give it a list of tuples (in the right order), or assign the key value pairs in the order you want them to be dumped.

    As you can see within the indentation of the sequence the dash has an offset, this is something you cannot achieve using PyYAML without rewriting its emitter.


    Within PyYAML you don't want to do print(yaml.safe_dump(data)) as that is inefficient both wrt. memory and time, always use yaml.safe_dump(data, sys.stdout) instead.