Search code examples
pythonpyyaml

PyYAML : Control ordering of items called by yaml.load()


I have a yaml setting file which creates some records in db:

setting1:
  name: [item,item]
  name1: text
anothersetting2:
  name: [item,item]
  sub_setting:
      name :[item,item]

when i update this file with setting3 and regenerate records in db by:

import yaml
fh = open('setting.txt', 'r')
setting_list = yaml.load(fh)
for i in setting_list:
    add_to_db[i]

it's vital that the order of them settings (id numbers in db) stay the same each time as im addig them to the db... and setting3 just gets appended to the yaml.load()'s end so that its id doesn't confuse any records which are already in the db ... At the moment each time i add another setting and call yaml.load() records get loaded in different order which results in different ids. I would welcome any ideas ;)

EDIT: I've followed abarnert tips and took this gist https://gist.github.com/844388

Works as expected thanks !


Solution

  • The YAML spec clearly says that the key order within a mapping is a "representation detail" that cannot be relied on. So your settings file is already invalid if it's relying on the mapping, and you'd be much better off using valid YAML, if at all possible.

    Of course YAML is extensible, and there's nothing stopping you from adding an "ordered mapping" type to your settings files. For example:

    !omap setting1:
      name: [item,item]
      name1: text
    !omap anothersetting2:
      name: [item,item]
      !omap sub_setting:
          name :[item,item]
    

    You didn't mention which yaml module you're using. There is no such module in the standard library, and there are at least two packages just on PyPI that provide modules with that name. However, I'm going to guess it's PyYAML, because as far as I know that's the most popular.

    The extension described above is easy to parse with PyYAML. See http://pyyaml.org/ticket/29:

    def omap_constructor(loader, node):
        return loader.construct_pairs(node)
    yaml.add_constructor(u'!omap', omap_constructor)
    

    Now, instead of:

    {'anothersetting2': {'name': ['item', 'item'],
      'sub_setting': 'name :[item,item]'},
     'setting1': {'name': ['item', 'item'], 'name1': 'text'}}
    

    You'll get this:

    (('anothersetting2', (('name', ['item', 'item']),
      ('sub_setting', ('name, [item,item]'),))),
     ('setting1', (('name', ['item', 'item']), ('name1', 'text'))))
    

    Of course this gives you a tuple of key-value tuples, but you can easily write a construct_ordereddict and get an OrderedDict instead. You can also write a representer that stores OrdereredDict objects as !omaps, if you need to output as well as input.

    If you really want to hook PyYAML to make it use an OrderedDict instead of a dict for default mappings, it's pretty easy to do if you're already working directly on parser objects, but more difficult if you want to stick with the high-level convenience methods. Fortunately, the above-linked ticket has an implementation you can use. Just remember that you're not using real YAML anymore, but a variant, so any other software that deals with your files can, and likely will, break.