Search code examples
pythonregexyamlpyyaml

Parsing Yaml in Python: Detect duplicated keys


The yaml library in python is not able to detect duplicated keys. This is a bug that has been reported years ago and there is not a fix yet.

I would like to find a decent workaround to this problem. How plausible could be to create a regex that returns all the keys ? Then it would be quite easy to detect this problem.

Could any regex master suggest a regex that is able to extract all the keys to find duplicates ?

File example:

mykey1:
    subkey1: value1
    subkey2: value2
    subkey3:
      - value 3.1
      - value 3.2
mykey2:
    subkey1: this is not duplicated
    subkey5: value5
    subkey5: duplicated!
    subkey6:
       subkey6.1: value6.1
       subkey6.2: valye6.2

Solution

  • The yamllint command-line tool does what you want:

    sudo pip install yamllint
    

    Specifically, it has a rule key-duplicates that detects repetitions and keys over-writing one another:

    $ yamllint test.yaml
    test.yaml
      1:1       warning  missing document start "---"  (document-start)
      10:5      error    duplication of key "subkey5" in mapping  (key-duplicates)
    

    (It has many other rules that you can enable/disable or tweak.)