Search code examples
pythonyamlpyyaml

Constructing a YAML object dynamically


I am using the PyYAML library to generate a YAML file dynamically.

For instance,

import yaml

with open(r'content.txt', 'r') as a_file:


  with open(r'file.yaml', 'a') as file:

    file.write("  root:")
    file.write("\n")
       
  for line in a_file:
    
    stripped_line = line.strip()
    txt = stripped_line
    x = txt.split("=")
    
    with open(r'file.yaml', 'a') as file:
       yaml.dump([{'child-1':x[0],'child-2':x[1]}])

The content.txt file may contain data in this form:

a=b
c=d

The desired final YAML file should look like this:

  root:
  - child-1: a
    child-2: b

  - child-1: c
    child-2: d

Please note the indentation of the root object, assume it is nested into another root object

But the above code makes the output to be:

  root:
-   child-1: a
    child-2: b

-   child-1: c
    child-2: d

Which is not a valid YAML. And mentioning the root object in the yaml.dump() command duplicates it:

#for line in content.txt
#yaml.dump({'root':[{'child-1': x[0], 'child-2':x[1]}])

  root:
  - child-1: a
    child-2: b
  root
  - child-1: c
    child-2: d

Since the python yaml.dump() function requires us to mention the object completely with the root along with its child object, it seems difficult to separately handle the 2 objects.

Is there a way to separate call these objects and append/link the root-child objects later?


Solution

  • You are not using PyYAML to write to the YAML document, because you forgot to provide the stream argument to dump(), which in PyYAML unfortunately is optional and because you don't use its resulting (string) value, can be replaced by pass.

    Apart from that you should write a YAML document to a stream in one go, and not try to append to it ('a'). So create and dump the whole data-structure that you want at once:

    import sys
    
    
    import yaml   
    
    children = []
    data = dict(root=children)
    
    with open('content.txt', 'r') as a_file:
        for line in a_file:
            x = line.strip().split('=')
            children.append({'child-1': x[0], 'child-2': x[1]})
    
    with open('file.yaml', 'w') as o_file:
        yaml.dump(data, o_file)
    
    with open('file.yaml', 'r') as i_file:
         print(i_file.read(), end='')
    

    which gives:

    root:
    - child-1: a
      child-2: b
    - child-1: c
      child-2: d
    

    You should also note that PyYAML only supports YAML version 1.1 although the latest version is YAML 1.2, so you might want to switch to ruamel.yaml library, which supports this standard (which is more than 10 years old. Disclaimer I am the developer of ruamel.yaml). In order to do this replace

    import yaml
    

    with

     import ruamel.yaml
     yaml = ruamel.yaml.YAML(type='safe')
    

    Which is also much faster for large files.