Search code examples
pythondictionaryyamlpyyamlruamel.yaml

Why is my loop adding an additional key value pair at the end of my YAML file?


I have two separate YAML files that I'm parsing through using ruamel.yaml.

File_one:

---
value: 600
another_value: 330
main_config::config::config:
  username:
    password: 'ENCRYPTED_PASSWORD'
    name: 'John Doe'
    role: 'admin'
    expiration: '2055-01-01'
    email: '[email protected]'

and File_two:

---
main_config::config::config:
  username:
    password: 'ENCRYPTED_PASSWORD'
    name: 'John Doe'
    role: 'admin'
    expiration: '2055-01-01'
    email: '[email protected]'
another_config::config:
  setting1:
    setting2:
      setting3:
        setting4: false

I'm using raumel to parse through the files and update the expiration dates, while preserving the entire file. However it seems to work in File_one but NOT file_two.

File_two will not only update the expiration date, but add an additional expiration date to the end of the file as well, like so:

---
main_config::config::config:
  username:
    password: 'ENCRYPTED_PASSWORD'
    name: 'John Doe'
    role: 'admin'
    expiration: '2055-01-01'
    email: '[email protected]'
another_config::config:
  setting1:
    setting2:
      setting3:
        setting4: false
    expiration: '2055-01-01'

The snippet of code I am using is:

with open(f"./path/to/file.yaml", 'r') as f:
    file = yaml.load(f)

    for k, v in file.items():
        if isinstance(v, dict):
            for x, y in v.items():
                if isinstance(y, dict):
                    y['expiration'] = DATE
    # Write new expiration date to file in CM:
    with open(f"./path/to/file.yaml", "w") as edit_file:
        yaml.dump(file, edit_file)

Why the output is different for both files using the same code?


Solution

  • Your File_one has only one key at the root level that has a dict as value and the expiration date gets updated in that. Your File_two has two keys at the root level that have a dict as value, and they both get updated. Because the input is different and triggers the "update expiration" piece of code twice in the second YAML example, the output is different. That is true in general: if the behaviour of code is dependend on the data processed, the output can differ for different input, although the code is the same.

    You can add an extra test to see if expiration is a key in the dict before you update it (i.e. you'll never add it if it is not already there:

                if isinstance(y, dict) and 'expiration' in y:
    

    , or you can exit if you always want to add the expiration key to the first dict found:

                if isinstance(y, dict):
                    y['expiration'] = DATE
                    break
    

    Problematic is that your code writes to an open file while it is still being read. You could move your last two lines outside of the initial with statement, but I recommend using pathlib.Path that will properly open the files as binaries (necessary to dump as UTF-8):

    from pathlib import Path
    yaml_file = Path('./path/to.file.yaml')
    data = yaml.load(yaml_file)
    
    for k, v in file.items():
        if isinstance(v, dict):
            for x, y in v.items():
                if isinstance(y, dict):
                    y['expiration'] = DATE
                    break
    
    # Write new expiration date to file in CM:
    yaml.dump(file, yaml_file)