Search code examples
pythonyamlpuppetpyyamlhiera

Parsing probably invalid YAML with PyYaml


I would like to parse a puppet related yaml config with pyyaml. Unforunately it seems that pyyaml is unable to parse some of the YAML files because of this part:

base::files:

  /var/log/fpm:
    ensure: 'directory'
    mode: '777'

  /etc/nginx/ssl/cert:
   ensure: 'directory'

  /apps:
   ensure: 'directory'
   owner: user
   group: user

  ['/apps/ecert-public', '/apps/ecert-public/config', '/apps/ecert-public/releases']:
    ensure: 'directory'
    owner: 'user'
    group: 'user'

  ['/apps/site-public', '/apps/site-public/config', '/apps/site-public/releases']:
    ensure: 'directory'
    owner: 'user'
    group: 'user'

The problem are the mappings with multiple values (inside the square brackets). I get the following error message while trying to parse this part with pyyaml:

while constructing a mapping in "/hieradata/node/wc-de.yaml", line 133, column 3 found unhashable key in "/hieradata/node/wc-de.yaml", line 212, column 3

Some YAML validators say this is valid YAML (like: http://www.yamllint.com/) but most of them I've tried have also problems to parse this part. Has anyone an idea how I can solve this issue with pyyaml? Unfortunately I can not change the YAML itself, so I need a solution to parse it how it is.


Solution

  • This is perfectly valid YAML. It is PyYAML that is the problem. Like all other YAML processors that fail to load this, that I know about, it can parse that YAML without problems (and compose if the processor implements that step), but fail during the constructing step of the loading process.

    If you would use ruamel.yaml (disclaimer: I am the author of that package) and have your input in the file input.yaml:

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    file_name = Path('input.yaml')
    
    yaml = ruamel.yaml.YAML()
    data = yaml.load(file_name)
    print(data['base::files'][('/apps/ecert-public', '/apps/ecert-public/config', '/apps/ecert-public/releases')]['ensure'])
    print('\n-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-\n')
    yaml.dump(data, sys.stdout)
    

    gives:

    directory
    
    -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-
    
    base::files:
    
      /var/log/fpm:
        ensure: directory
        mode: '777'
    
      /etc/nginx/ssl/cert:
        ensure: directory
    
      /apps:
        ensure: directory
        owner: user
        group: user
    
    
      [/apps/ecert-public, /apps/ecert-public/config, /apps/ecert-public/releases]:
        ensure: directory
        owner: user
        group: user
    
      [/apps/site-public, /apps/site-public/config, /apps/site-public/releases]:
        ensure: directory
        owner: user
        group: user