Search code examples
pythonyamlaws-cloudformationpyyamlruamel.yaml

How to parse Cloudformation YAML to get all the !ImportValue from YAML template?


I am working on a project to parse an AWS Cloudformation Yaml File to extract all the !ImportValue from the YAML template.

I am trying to use ruamel.yaml to parse that (to which I am new), I was able to read the YAML file and get the individual elements.

import ruamel.yaml

def general_constructor(loader, tag_suffix, node):
  return node.value

ruamel.yaml.SafeLoader.add_multi_constructor(u'!', general_constructor)

with open(cfFile, 'r') as service:
  stream = service.read()

yaml_data = ruamel.yaml.safe_load(stream)
print yaml_data

Above code gets the content of specified YAML file and the output looks like following.

{'Application': {'Properties': {'ApplicationName': [ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'-'),
    SequenceNode(tag=u'tag:yaml.org,2002:seq', value=[ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'***'), ScalarNode(tag=u'!ImportValue', value=u'jkl')])],
   *
   *
     ScalarNode(tag=u'!ImportValue', value=u'def'),
   *
   *
     ScalarNode(tag=u'!ImportValue', value=u'rst')])]},


So there are bunch of !ImportValue listed in ScalarNode (e.g ScalarNode(tag=u'!ImportValue', value=u'rst')), I actually want to extract that. Now these ImportValues are scattered in the template at various places. What would be the best way to extract the Value of those? In our cloudformation, we have bunch of YAML files, some of them Exports certain resource and other YAML files import them. So, I want to build a sort of dependency map (May be a JSON file) which will depict the interdependence between Cloud-formation files.


Solution

  • If you use ruamel.yaml's round-trip loader you don't have to do anything special to load the tag, and walking recursively over the resulting data structure is relatively easy. The corresponding key needs to be passed on, as at least the first !ImportValue is within a sequence under the key.

    Assuming an input.yaml consisting of:

    Application:
      Properties:
        ApplicationName: ["-", ["**", !ImportValue "jkl"]]
    
      AnotherKey:
      - 42
      - nested: !ImportValue xyz
    

    (which might not be exactly what you got as input, but will do for demonstration purposes), and using the new ruamel.yaml API (which defaults to round-trip loading/dumping):

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    ta = ruamel.yaml.comments.Tag.attrib
    
    yaml = ruamel.yaml.YAML()
    data = yaml.load(Path('input.yaml'))
    
    def process(d, key=None):
        if isinstance(d, dict):
            for k, v in d.items():
                for res in process(v, k):  # recurse and pass on new key
                    yield res
        elif isinstance(d, list):
            for item in d:
                for res in process(item, key):
                    yield res
        else:
           try:
               if getattr(d, ta, None).value == '!ImportValue':
                   yield (key, d)
           except AttributeError:
               pass
    
    for k, v in process(data):
       print(k, '->', v)
    

    which gives:

    ApplicationName -> jkl
    nested -> xyz