Search code examples
pythonpython-3.xpython-2.7yamlpyyaml

How do I write a yaml file from a dictionary with python?


I have a csv file which containing data where the header contains keys and the cells contain values. I would like to use python to create a yaml file from the contents of the csv file.

I created a dictionary of the K:V pairs; however, I am stuck trying to get the K:V pairs into the yaml file.

The structure of the yaml must be:

key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---

If I were to manually create these, I would have more than 1000 YAMLs so it's pretty time consuming and unrealistic.

I am looking for any ideas your much more experienced people might have.

I would really like the output to iterate through the dictionary to create a huge listing of YAMLs like below:

key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---
key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---
key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---
key1: value1
key2: value2
key3:
  -  key4: value4
     key5: {key6: [value6]
key7: value7
key8: value8
key9: value9
  -
  -
---

Sample Code:

import csv
import yaml

def csv_dict_list(variables_file) :

    reader=csv.DictReader(open(variables_file, 'r'))
    dict_list = []
    for line in reader:
        dict_list.append(line)
    return dict_list

yaml_values = csv_dict_list(sys.argv[1])

No matter what I try after this, I can not get the desired output using yaml.load() or yaml.load_all().


Solution

  • First of all, you should use dump() or dump_all(), since you want to write YAML, instead of using load().

    You also should also be aware that the CSV reader does return something different on Python 2.7 then e.g. on Python 3.6: on the first you get a list of dict back from csv_dict_list and on the second a list of OrderedDict). That in itself would not be a problem, but PyYAML dumps a dict with the keys sorted, and an ordereddict with a tag.

    Your proposed YAML is also not valid, as the flow style mapping in the line:

     key5: {key6: [value6]
    

    is not terminated with a } before the end of the document, you also cannot have:

    key9: value9
      -
      - 
    

    either use:

    key9: value9
    key10:
      -
      -
    

    or

    key9: 
      - value9
      -
    

    or something similar (there is also no equivalent Python data structure that has both a value and a list for one and the same key, so cannot actually create something like that even in Python).

    PyYAML additionally lacks the support for indenting your block style sequence. If you do:

    import yaml
    print(yaml.dump(dict(x=[dict(a=1, b=2)]), indent=4))
    

    the output will still be flush left:

    x:
    - {a: 1, b: 2}
    

    To prevent all these problems you will run into when using PyYAML, and to circumvent the differences in Python versions, I recommend you use ruamel.yaml (disclaimer: I am the author of that package), and the following code:

    import sys
    import csv
    import ruamel.yaml
    
    Dict = ruamel.yaml.comments.CommentedMap
    
    def csv_dict_list(variables_file) :
        reader=csv.reader(open(variables_file, 'r'))
        key_list = None
        dict_list = []
        for line in reader:
            if key_list is None:
                key_list = line
                continue
            d = Dict()
            for idx, v in enumerate(line):
                k = key_list[idx]
                # special handling of key3/key4/key5/key6
                if k == key_list[2]:
                    d[k] = []
                elif k == key_list[3]:
                    d[key_list[2]].append(Dict([(k, v)]))
                elif k == key_list[4]:
                    d[key_list[2]][0][k] = dt = Dict()
                    dt.fa.set_flow_style()
                elif k == key_list[5]:
                    d[key_list[2]][0][key_list[4]][k] = [v]
                else:
                    d[k] = v
            dict_list.append(d)
        return dict_list
    
    data = csv_dict_list('test.csv')
    
    
    yaml = ruamel.yaml.YAML()
    yaml.indent(sequence=4, offset=2)
    yaml.dump_all(data, sys.stdout)
    

    With test.csv:

    key1,key2,key3,key4,key5,key6,key7,key8,key9
    value_a1,value_a2,value_a3,value_a4,value_a5,value_a6,value_a7,value_a8,value_a9
    value_b1,value_b2,value_b3,value_b4,value_b5,value_b6,value_b7,value_b8,value_b9
    

    this gives:

    key1: value_a1
    key2: value_a2
    key3:
      - key4: value_a4
        key5: {key6: [value_a6]}
    key7: value_a7
    key8: value_a8
    key9: value_a9
    ---
    key1: value_b1
    key2: value_b2
    key3:
      - key4: value_b4
        key5: {key6: [value_b6]}
    key7: value_b7
    key8: value_b8
    key9: value_b9
    

    on both Python 2.7 and Python 3.6