Search code examples
pythonyamlpyyaml

A single string in single quotes with PyYAML


When I edit a YAML file in Python with PyYAML, all of my string values are saved back to the original file without quotes.

one: valueOne
two: valueTwo
three: valueThree

I wanted one of those strings to be surrounded with single quotes:

one: valueOne
two: valueTwo
three: 'valueThree'

Changing the default_style parameter in yaml_dump affects whole file, which is not desired. I thought about adding single quotes to the beginning and end of a string that I want to be surrounded with:

valueThreeVariable = "'" + valueThreeVariable + "'"

However, this ends up with a dumped YAML looking like this:

one: valueOne
two: valueTwo
three: '''valueThree'''

I have tried escaping the single quote in various ways, using unicode or raw strings, all to no avail. How can I make only one of my YAML values to be a string surrounded with single quotes?


Solution

  • You can graft such functionality onto PyYAML but it is not easy. The value in the mapping for three has to be some instance of a class different from a normal string, otherwise the YAML dumper doesn't know that it has to do something special and that instance is dumped as string with quotes. On loading scalars with single quotes need to be created as instances of this class. And apart from that you probably don't want the keys of your dict/mapping scrambled as PyYAML does by default.

    I do something similar to the above in my PyYAML derivative ruamel.yaml for block style scalars:

    import ruamel.yaml
    
    yaml_str = """\
    one: valueOne
    two: valueTwo
    three: |-
      valueThree
    """
    
    data = ruamel.yaml.round_trip_load(yaml_str)
    assert ruamel.yaml.round_trip_dump(data) == yaml_str
    

    doesn't throw an assertion error.


    To start with the dumper, you can "convert" the valueThree string:

    import ruamel.yaml
    from ruamel.yaml.scalarstring import ScalarString
    
    yaml_str = """\
    one: valueOne
    two: valueTwo
    three: 'valueThree'
    """
    
    class SingleQuotedScalarString(ScalarString):
        def __new__(cls, value):
            return ScalarString.__new__(cls, value)
    
    data = ruamel.yaml.round_trip_load(yaml_str)
    data['three'] = SingleQuotedScalarString(data['three'])
    

    but this cannot be dumped, as the dumper doesn't know about the SingleQuotedScalarString. You can solve that in different ways, the following extends ruamel.yaml's RoundTripRepresenter class:

    from ruamel.yaml.representer import RoundTripRepresenter
    import sys
    
    def _represent_single_quoted_scalarstring(self, data):
        tag = None
        style = "'"
        if sys.version_info < (3,) and not isinstance(data, unicode):
            data = unicode(data, 'ascii')
        tag = u'tag:yaml.org,2002:str'
        return self.represent_scalar(tag, data, style=style)
    
    RoundTripRepresenter.add_representer(
        SingleQuotedScalarString,
        _represent_single_quoted_scalarstring)
    
    assert ruamel.yaml.round_trip_dump(data) == yaml_str
    

    Once again doesn't throw an error. The above can be done in PyYAML and the safe_load/safe_dump in principle, but you would need to write code to preserve the key ordering, as well as some of the base functionality. (Apart from that PyYAML only supports the older YAML 1.1 standard not the YAML 1.2 standard from 2009).

    To get the loading to work without using the explicit data['three'] = SingleQuotedScalarString(data['three']) conversion, you can add the following before the call to ruamel.yaml.round_trip_load():

    from ruamel.yaml.constructor import RoundTripConstructor
    from ruamel.yaml.nodes import ScalarNode
    from ruamel.yaml.compat import text_type
    
    def _construct_scalar(self, node):
        if not isinstance(node, ScalarNode):
            raise ConstructorError(
                None, None,
                "expected a scalar node, but found %s" % node.id,
                node.start_mark)
    
        if node.style == '|' and isinstance(node.value, text_type):
            return PreservedScalarString(node.value)
        elif node.style == "'" and isinstance(node.value, text_type):
            return SingleQuotedScalarString(node.value)
        return node.value
    
    RoundTripConstructor.construct_scalar = _construct_scalar
    

    There are different ways to do the above, including subclassing the RoundTripConstructor class, but the actual method to change is small and can easily be patched.


    Combining all of the above and cleaning up a bit you get:

    import ruamel.yaml
    from ruamel.yaml.scalarstring import ScalarString
    from ruamel.yaml.representer import RoundTripRepresenter
    from ruamel.yaml.constructor import RoundTripConstructor
    from ruamel.yaml.nodes import ScalarNode
    from ruamel.yaml.compat import text_type, PY2
    
    
    class SingleQuotedScalarString(ScalarString):
        def __new__(cls, value):
            return ScalarString.__new__(cls, value)
    
    
    def _construct_scalar(self, node):
        if not isinstance(node, ScalarNode):
            raise ConstructorError(
                None, None,
                "expected a scalar node, but found %s" % node.id,
                node.start_mark)
    
        if node.style == '|' and isinstance(node.value, text_type):
            return PreservedScalarString(node.value)
        elif node.style == "'" and isinstance(node.value, text_type):
            return SingleQuotedScalarString(node.value)
        return node.value
    
    RoundTripConstructor.construct_scalar = _construct_scalar
    
    
    def _represent_single_quoted_scalarstring(self, data):
        tag = None
        style = "'"
        if PY2 and not isinstance(data, unicode):
            data = unicode(data, 'ascii')
        tag = u'tag:yaml.org,2002:str'
        return self.represent_scalar(tag, data, style=style)
    
    RoundTripRepresenter.add_representer(
        SingleQuotedScalarString,
        _represent_single_quoted_scalarstring)
    
    
    yaml_str = """\
    one: valueOne
    two: valueTwo
    three: 'valueThree'
    """
    
    data = ruamel.yaml.round_trip_load(yaml_str)
    assert ruamel.yaml.round_trip_dump(data) == yaml_str
    

    Which still runs without assertion error, i.e. with dump output equalling input. As indicated you can do this in PyYAML, but it requires considerably more coding.


    With a more modern version (ruamel.yaml>0.14) you can do:

    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    
    data = yaml.load(yaml_str)
    yaml.dump(data, sys.stdout)
    

    and preserve the single quotes.