Search code examples
pythonyamlpyyaml

Unable to get PyYAML objects to represent correctly


I am using PyYaml to regenerate a YAML file but I have unwanted angle brackets around the dumped output:

Source YAML file:

 Outputs:
  HarvestApi:
    Description: URL for application
    Value: !Ref LocationRef
    Export:
      Name: HarvestApi

and the python file should just parse and then dump the YAML:

#!/usr/bin/env python3.6

import yaml
import sys

class RefTag(yaml.YAMLObject):
  yaml_tag = u'Ref'
  def __init__(self, text):
    self.text = text
  def __repr__(self):
    return "%s( text=%r)" % ( self.__class__.__name__, self.text)
  @classmethod
  def from_yaml(cls, loader, node):
    return RefTag(node.value)
  @classmethod
  def to_yaml(cls, dumper, data):
    return dumper.represent_scalar(cls.yaml_tag, data.text)
yaml.SafeLoader.add_constructor('!Ref', RefTag.from_yaml)
yaml.SafeDumper.add_multi_representer(RefTag, RefTag.to_yaml)

yaml_list = None
with open("./yaml-test.yml", "r")  as file:  
  try:
    yaml_list = yaml.safe_load(file)
  except yaml.YAMLError as exc:
    print ("--", exc)
    sys.exit(1)

print (yaml.dump(yaml_list, default_flow_style=False))

But instead outputs this:

Outputs:
  HarvestApi:
    Description: URL for application
    Export:
      Name: HarvestApi
    Value: !<Ref> 'LocationRef'

Those extra angle brackets around the Ref object are what I need to remove.


Solution

  • The main problem is that your tag doesn't start with an exclamation mark. Just adding that will give you the expected output. For reference see the PyYAML example for the Monster class.

    The additional problematic things are:

    • the FAQ on yaml.org has stated since Sep 2006 that the recommended file extension for YAML files is .yaml

    • PyYAML dumping (and loading) has a streaming interface, but has a much abused convenience option to leave out the stream, after which the output is written to a memory buffer that is returned as a string. Using that to then stream out the resulting string using:

      print(dump(yaml_list, ...))
      

      is slow and memory inefficient.

    • You register your loader and dumper for RefTag on the SafeLoader, which is good as there is no need to go unsafe with the default PyYAML Loader and Dumper. But then you call yaml.dump() instead of yaml.safe_dump(). The former works, but using the latter is better, as it will complain about non-registered objects in your data structure (if there would be any of course, not with the input, you are now using).

    So change things to:

    #!/usr/bin/env python3.6
    
    import yaml
    import sys
    
    class RefTag(yaml.YAMLObject):
      yaml_tag = u'!Ref'
      def __init__(self, text):
        self.text = text
      def __repr__(self):
        return "%s( text=%r)" % ( self.__class__.__name__, self.text)
      @classmethod
      def from_yaml(cls, loader, node):
        return RefTag(node.value)
      @classmethod
      def to_yaml(cls, dumper, data):
        return dumper.represent_scalar(cls.yaml_tag, data.text)
    
    yaml.SafeLoader.add_constructor('!Ref', RefTag.from_yaml)
    yaml.SafeDumper.add_multi_representer(RefTag, RefTag.to_yaml)
    
    yaml_list = None
    with open("./yaml-test.yaml", "r")  as file:  
      try:
        yaml_list = yaml.safe_load(file)
      except yaml.YAMLError as exc:
        print ("--", exc)
        sys.exit(1)
    
    yaml.safe_dump(yaml_list, sys.stdout, default_flow_style=False)
    

    which gives:

    Outputs:
      HarvestApi:
        Description: URL for application
        Export:
          Name: HarvestApi
        Value: !Ref 'LocationRef'