Search code examples
pythonyamlpyyaml

How to prevent re-definition of keys in YAML?


Is there any way to cause yaml.load to raise an exception whenever a given key appears more than once in the same dictionary?

For example, parsing the following YAML would raise an exception, because some_key appears twice:

{
  some_key: 0,
  another_key: 1,
  some_key: 1
}

Actually, the behavior described above corresponds to the simplest policy regarding key redefinitions. A somewhat more elaborate policy could, for example, could specify that only redefinitions that change the value assigned to the key would result in an exception, or could allow setting the level of severity of key-redefinition to "warning" rather than "error". Etc. An ideal answer to this question would be capable of supporting such variants.


Solution

  • If you want the loader to throw an error, then you should just define your own loader, with a constructor that checks if the key is already in the mapping ¹:

    import collections
    import ruamel.yaml as yaml
    
    from ruamel.yaml.reader import Reader
    from ruamel.yaml.scanner import Scanner
    from ruamel.yaml.parser_ import Parser
    from ruamel.yaml.composer import Composer
    from ruamel.yaml.constructor import Constructor
    from ruamel.yaml.resolver import Resolver
    from ruamel.yaml.nodes import MappingNode
    from ruamel.yaml.compat import PY2, PY3
    
    
    class MyConstructor(Constructor):
        def construct_mapping(self, node, deep=False):
            if not isinstance(node, MappingNode):
                raise ConstructorError(
                    None, None,
                    "expected a mapping node, but found %s" % node.id,
                    node.start_mark)
            mapping = {}
            for key_node, value_node in node.value:
                # keys can be list -> deep
                key = self.construct_object(key_node, deep=True)
                # lists are not hashable, but tuples are
                if not isinstance(key, collections.Hashable):
                    if isinstance(key, list):
                        key = tuple(key)
                if PY2:
                    try:
                        hash(key)
                    except TypeError as exc:
                        raise ConstructorError(
                            "while constructing a mapping", node.start_mark,
                            "found unacceptable key (%s)" %
                            exc, key_node.start_mark)
                else:
                    if not isinstance(key, collections.Hashable):
                        raise ConstructorError(
                            "while constructing a mapping", node.start_mark,
                            "found unhashable key", key_node.start_mark)
    
                value = self.construct_object(value_node, deep=deep)
                # next two lines differ from original
                if key in mapping:
                    raise KeyError
                mapping[key] = value
            return mapping
    
    
    class MyLoader(Reader, Scanner, Parser, Composer, MyConstructor, Resolver):
        def __init__(self, stream):
            Reader.__init__(self, stream)
            Scanner.__init__(self)
            Parser.__init__(self)
            Composer.__init__(self)
            MyConstructor.__init__(self)
            Resolver.__init__(self)
    
    
    
    yaml_str = """\
    some_key: 0,
    another_key: 1,
    some_key: 1
    """
    
    data = yaml.load(yaml_str, Loader=MyLoader)
    print(data)
    

    and that throws a KeyError.

    Please note that the curly braces you use in your example are unnecessary.

    I am not sure if this will work with merge keys.


    ¹ This was done using ruamel.yaml of which I am the author. ruamel.yaml an enhanced version of PyYAML, and the loader code for the latter should be similar.