Search code examples
pythonyamldump

sort a YAML block mapping sequence in python


I am trying to sort a YAML block mapping sequence in the way I want it... I would like to have something like

depth: !!opencv-matrix
    rows: 480
    cols: 640
    dt: f
    data: 'x'

but everytime I do dumping, it changes to

cols: 640
    data: 'x'
    depth: !!opencv-matrix
    dt: f
    rows: 480

I checked on a simple and easy way to do it here with

ordering = ['ymlFile','depth', 'rows', 'cols', 'dt', 'data']
ordered_set = [{'depth': '!!opencv-matrix'}, {'rows' : depthSizeImg[0]}, {'cols' : depthSizeImg[1]}, {'dt' : type(img_d[0][0])}, {'data': ymlList.tolist()}]]

f = open(out_d, 'a')
f.write('%YAML:1.0 \n')
f.write(yaml.dump(data, default_flow_style=None, allow_unicode=False, indent = 4))
f.close()

But it made the YAML not in a nested way.

%YAML:1.0 
- {depth: '!!opencv-matrix'}
- {rows: 323}
- {cols: 110}
- {dt: !!python/name:numpy.float32 ''}
- {data: 'x'}

How can I get the correct output?


Solution

  • In your example

    ordered_set = [{'depth': '!!opencv-matrix'}, {'rows' : depthSizeImg[0]}, {'cols' : depthSizeImg[1]}, {'dt' : type(img_d[0][0])}, {'data': ymlList.tolist()}]]
    

    You are dumping a list of dicts and that is what you get as YAML output. Calling a list ordered_set doesn't make it a set and including the YAML tags ( those !!object_name entries) in your data doesn't change them either.

    The YAML specification uses !!omap (example 2.26) which combine the ordered structure of a sequence with single key mappings as elements:

    depth: !!omap
      - rows: 480
      - cols: 640
      - dt: f
      - data: x
    

    if you read that into PyYAML you get:

    {'depth': [('rows', 480), ('cols', 640), ('dt', 'f'), ('data', 'x')]}
    

    which means you cannot get the value of rows by simple keyword lookup. If you dump the above to YAML you get the even more ugly:

    depth:
    - !!python/tuple [rows, 480]
    - !!python/tuple [cols, 640]
    - !!python/tuple [dt, f]
    - !!python/tuple [data, x]
    

    and you cannot get around that with PyYAML without defining some mapping from !!omap to an ordereddict implementation and vv.

    What you need is a more intelligent "Dumper" for your YAML ¹:

    import ruamel.yaml as yaml
    
    yaml_str = """\
    depth: !!omap
      - rows: 480
      - cols: 640
      - dt: f
      - data: x
    """
    
    data1 = yaml.load(yaml_str)
    data1['depth']['data2'] = 'y'
    print(yaml.dump(data1, Dumper=yaml.RoundTripDumper))
    

    which gives:

    depth: !!omap
    - rows: 480
    - cols: 640
    - dt: f
    - data: x
    - data2: y
    

    Or combine that with a smart loader (which doesn't throw away the ordering information existing in the input), and you can leave out the !!omap:

    import ruamel.yaml as yaml
    
    yaml_str = """\
    depth:
      - rows: 480
      - cols: 640   # my number of columns
      - dt: f
      - data: x
    """
    
    data3 = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
    print(yaml.dump(data3, Dumper=yaml.RoundTripDumper))
    

    which gives:

    depth:
    - rows: 480
    - cols: 640     # my number of columns
    - dt: f
    - data: x
    

    (including the preserved comment).


    ¹ This was done using ruamel.yaml of which I am the author. You should be able to do the example with data1 in PyYAML with some effort, the other example cannot be done without a major enhancement of PyYAML, which is exactly what ruamel.yaml is.