Search code examples
pythonyamlpyyaml

Load YAML file into Python preserving aliases


I am making a program that needs to check that certain fields are defined with the correct alias string. For example:

networks:
  base: 
    name: build
    address: &dummyname
     url: 192.168.1.1
     port: 8080 
  first: 
    name: masterA
    address: *dummyname
  second: 
    name: masterB
    address: *dummyname

I need to check whether field address is defined with the alias "*dummyname" in first and second, no matter what the content of the alias is.

When executing load with PyYaml, aliases are always rendered, so I am not able to check that:

data = yaml.safe_load(file_data)

data rendered as python dict:

networks:
  base: 
    name: build
    address: 
      url: 192.168.1.1
      port: 8080 
  first: 
    name: masterA
    address: 
      url: 192.168.1.1
      port: 8080 
  second: 
    name: masterB
    address: 
      url: 192.168.1.1
      port: 8080 

I have seen similar posts for the other way around, dumping python object to YAML without creating aliases/anchors, but I haven't found a solution for this.

How can I access the alias used in the YAML document?


Solution

  • As you indicated, PyYAML does not give you access to the anchor/alias name, it uses it internally to resolve this. When you dump data again, you'll notice that you get a generic anchor (e.g. &id0001).

    If you would use ruamel.yaml to round-trip your data in this way, you can see that your actual anchor/alias is preserved:

    import sys
    import ruamel.yaml
    
    file_in = Path('input.yaml')
        
    yaml = ruamel.yaml.YAML()
    data = yaml.load(file_in)
    yaml.dump(data, sys.stdout)
    

    as this gives:

    networks:
      base:
        name: build
        address: &dummyname
          url: 192.168.1.1
          port: 8080
      first:
        name: masterA
        address: *dummyname
      second:
        name: masterB
        address: *dummyname
    

    You can inspect the loaded data structure:

    # you get the same object, whether using `first`, `second` or `base`
    address = data['networks']['first']['address']
    print(address, type(address))
    print('\n'.join([k for k in dir(address) if k[0] != '_']))  # skip the build-in attributes
    

    which gives:

    ordereddict([('url', '192.168.1.1'), ('port', 8080)]) <class 'ruamel.yaml.comments.CommentedMap'>
    add_referent
    add_yaml_merge
    anchor
    ca
    clear
    copy
    copy_attributes
    fa
    fromkeys
    get
    insert
    items
    keys
    lc
    merge
    mlget
    move_to_end
    non_merged_items
    pop
    popitem
    rya
    setdefault
    tag
    update
    update_key_value
    values
    yaml_add_eol_comment
    yaml_anchor
    yaml_end_comment_extend
    yaml_key_comment_extend
    yaml_set_anchor
    yaml_set_comment_before_after_key
    yaml_set_start_comment
    yaml_set_tag
    yaml_value_comment_extend
    

    The likely candidate is attribute anchor, this actually an Anchor instance on which the original string can be retrieved:

    print(f'anchor: {address.anchor.value}')
    

    giving:

    anchor: dummyname
    

    Please note that these kind of internals might change, so pin the version of ruamel.yaml you are using and test before upgrading.