Search code examples
pythonyamlpyyamlruamel.yaml

Python ruamel.yaml dumps tags with quotes


I'm trying to use ruamel.yaml to modify an AWS CloudFormation template on the fly using python. I added the following code to make the safe_load working with CloudFormation functions such as !Ref. However, when I dump them out, those values with !Ref (or any other functions) will be wrapped by quotes. CloudFormation is not able to identify that.

See example below:

import sys, json, io, boto3
import ruamel.yaml

def funcparse(loader, node):
  node.value = {
      ruamel.yaml.ScalarNode:   loader.construct_scalar,
      ruamel.yaml.SequenceNode: loader.construct_sequence,
      ruamel.yaml.MappingNode:  loader.construct_mapping,
  }[type(node)](node)
  node.tag = node.tag.replace(u'!Ref', 'Ref').replace(u'!', u'Fn::')
  return dict([ (node.tag, node.value) ])

funcnames = [ 'Ref', 'Base64', 'FindInMap', 'GetAtt', 'GetAZs', 'ImportValue',
              'Join', 'Select', 'Split', 'Split', 'Sub', 'And', 'Equals', 'If',
              'Not', 'Or' ]

for func in funcnames:
    ruamel.yaml.SafeLoader.add_constructor(u'!' + func, funcparse)

txt = open("/space/tmp/a.template","r")
base = ruamel.yaml.safe_load(txt)
base["foo"] = {
    "name": "abc",
    "Resources": {
        "RouteTableId" : "!Ref aaa",
        "VpcPeeringConnectionId" : "!Ref bbb",
        "yourname": "dfw"
    }
}

ruamel.yaml.safe_dump(
    base,
    sys.stdout,
    default_flow_style=False
)

The input file is like this:

foo:
  bar: !Ref barr
  aa: !Ref bb

The output is like this:

foo:
  Resources:
    RouteTableId: '!Ref aaa'
    VpcPeeringConnectionId: '!Ref bbb'
    yourname: dfw
  name: abc

Notice the '!Ref VpcRouteTable' is been wrapped by single quotes. This won't be identified by CloudFormation. Is there a way to configure dumper so that the output will be like:

foo:
  Resources:
    RouteTableId: !Ref aaa
    VpcPeeringConnectionId: !Ref bbb
    yourname: dfw
  name: abc

Other things I have tried:

  • pyyaml library, works the same
  • Use Ref:: instead of !Ref, works the same

Solution

  • Essentially you tweak the loader, to load tagged (scalar) objects as if they were mappings, with the tag the key and the value the scalar. But you don't do anything to distinguish the dict loaded from such a mapping from other dicts loaded from normal mappings, nor do you have any specific code to represent such a mapping to "get the tag back".

    When you try to "create" a scalar with a tag, you just make a string starting with an exclamation mark, and that needs to get dumped quoted to distinguish it from real tagged nodes.

    What obfuscates this all, is that your example overwrites the loaded data by assigning to base["foo"] so the only thing you can derive from the safe_load, and and all your code before that, is that it doesn't throw an exception. I.e. if you leave out the lines starting with base["foo"] = { your output will look like:

    foo:
      aa:
        Ref: bb
      bar:
        Ref: barr
    

    And in that Ref: bb is not distinguishable from a normal dumped dict. If you want to explore this route, then you should make a subclass TagDict(dict), and have funcparse return that subclass, and also add a representer for that subclass that re-creates the tag from the key and then dumps the value. Once that works (round-trip equals input), you can do:

         "RouteTableId" : TagDict('Ref', 'aaa')
    

    If you do that, you should, apart from removing non-used libraries, also change your code to close the file-pointer txt in your code, as that can lead to problems. You can do this elegantly be using the with statement:

    with open("/space/tmp/a.template","r") as txt:
        base = ruamel.yaml.safe_load(txt)
    

    (I also would leave out the "r" (or put a space before it); and replace txt with a more appropriate variable name that indicates this is an (input) file pointer).

    You also have the entry 'Split' twice in your funcnames, which is superfluous.


    A more generic solution can be achieved by using a multi-constructor that matches any tag and having three basic types to cover scalars, mappings and sequences.

    import sys
    import ruamel.yaml
    
    yaml_str = """\
    foo:
      scalar: !Ref barr
      mapping: !Select
        a: !Ref 1
        b: !Base64 A413
      sequence: !Split
      - !Ref baz
      - !Split Multi word scalar
    """
    
    class Generic:
        def __init__(self, tag, value, style=None):
            self._value = value
            self._tag = tag
            self._style = style
    
    
    class GenericScalar(Generic):
        @classmethod
        def to_yaml(self, representer, node):
            return representer.represent_scalar(node._tag, node._value)
    
        @staticmethod
        def construct(constructor, node):
            return constructor.construct_scalar(node)
    
    
    class GenericMapping(Generic):
        @classmethod
        def to_yaml(self, representer, node):
            return representer.represent_mapping(node._tag, node._value)
    
        @staticmethod
        def construct(constructor, node):
            return constructor.construct_mapping(node, deep=True)
    
    
    class GenericSequence(Generic):
        @classmethod
        def to_yaml(self, representer, node):
            return representer.represent_sequence(node._tag, node._value)
    
        @staticmethod
        def construct(constructor, node):
            return constructor.construct_sequence(node, deep=True)
    
    
    def default_constructor(constructor, tag_suffix, node):
        generic = {
            ruamel.yaml.ScalarNode: GenericScalar,
            ruamel.yaml.MappingNode: GenericMapping,
            ruamel.yaml.SequenceNode: GenericSequence,
        }.get(type(node))
        if generic is None:
            raise NotImplementedError('Node: ' + str(type(node)))
        style = getattr(node, 'style', None)
        instance = generic.__new__(generic)
        yield instance
        state = generic.construct(constructor, node)
        instance.__init__(tag_suffix, state, style=style)
    
    
    ruamel.yaml.add_multi_constructor('', default_constructor, Loader=ruamel.yaml.SafeLoader)
    
    
    yaml = ruamel.yaml.YAML(typ='safe', pure=True)
    yaml.default_flow_style = False
    yaml.register_class(GenericScalar)
    yaml.register_class(GenericMapping)
    yaml.register_class(GenericSequence)
    
    base = yaml.load(yaml_str)
    base['bar'] = {
        'name': 'abc',
        'Resources': {
            'RouteTableId' : GenericScalar('!Ref', 'aaa'),
            'VpcPeeringConnectionId' : GenericScalar('!Ref', 'bbb'),
            'yourname': 'dfw',
            's' : GenericSequence('!Split', ['a', GenericScalar('!Not', 'b'), 'c']),
        }
    }
    yaml.dump(base, sys.stdout)
    

    which outputs:

    bar:
      Resources:
        RouteTableId: !Ref aaa
        VpcPeeringConnectionId: !Ref bbb
        s: !Split
        - a
        - !Not b
        - c
        yourname: dfw
      name: abc
    foo:
      mapping: !Select
        a: !Ref 1
        b: !Base64 A413
      scalar: !Ref barr
      sequence: !Split
      - !Ref baz
      - !Split Multi word scalar
    

    Please note that sequences and mappings are handled correctly and that they can be created as well. There is however no check that:

    • the tag you provide is actually valid
    • the value associated with the tag is of the proper type for that tag name (scalar, mapping, sequence)
    • if you want GenericMapping to behave more like dict, then you probably want it a subclass of dict (and not of Generic) and provide the appropriate __init__ (idem for GenericSequence/list)

    When the assignment is changed to something more close to yours:

    base["foo"] = {
        "name": "abc",
        "Resources": {
            "RouteTableId" : GenericScalar('!Ref', 'aaa'),
            "VpcPeeringConnectionId" : GenericScalar('!Ref', 'bbb'),
            "yourname": "dfw"
        }
    }
    

    the output is:

    foo:
      Resources:
        RouteTableId: !Ref aaa
        VpcPeeringConnectionId: !Ref bbb
        yourname: dfw
      name: abc
    

    which is exactly the output you want.