Search code examples
pythonyamlpyyamlruamel.yaml

Python YAML dumper single quote and double quote issue


I am reading rows from excel file and dumping to YAML file, after dumping i figured out some row are mentioned in single quote, double quote and plain text.

Data without any special characters are creaetd as plain text.
Data with \n character and parenthesis are created as 'Data here'
Data with special characters are created as "Data here"

I am using yaml dumper to create YAML file

with open(myprops['output'], "w") as f:
 ruamel.yaml.dump(doc,f, Dumper=ruamel.yaml.RoundTripDumper,default_flow_style=False)

How to represent all data to be in single quote - 'Data here'?


Solution

  • You can force the dumper to use single quotes, when the scalar can be represented using single quoted strings by providing the default_style="'" parameter.

    This is not guaranteed to get you single quotes though, single quotes cannot do the escape sequences that double quotes have (i.e. it is not like Python) and some values might still get double quotes.

    Using ruamel.yaml's new API (where round-trip-dumping is the default):

    import sys
    import ruamel.yaml
    
    data = [
       "25",
       "with an\n embedded newline",
       "entry with single quote: (')",
       42
    ]
    
    yaml = ruamel.yaml.YAML()
    yaml.default_style = "'"
    yaml.dump(data, sys.stdout)
    

    which gives:

    - '25'
    - "with an\n embedded newline"
    - 'entry with single quote: ('')'
    - !!int '42'
    

    Please note that in order to recognise 42 as an integer, because of the quotes, that scalar needs to be tagged. The same holds for the other special types YAML can represent (float, booleans, etc.) If you don't want that make sure all the values you dump are strings.

    You can also see the one escape mechanism single quoted scalars in YAML have: as single quote in the scalar is doubled. (And if it had been at the end of the Python string, you would have three single quotes in a row at the end of the scalar.

    If you want consistency in your quoting, you should use double quotes, as that can represent all valid characters. Single quoted scalars in YAML can span multiple lines, so in principle it is possible to embed a newline. But there are restrictions on whitespace around the newline.

    If you have a mix of string and non-string values in your input data, and you don't want to get the non-strings quoted, then you have to recurse over the data structure and replace each string x with ruamel.yaml.scalarstring.SingleQuotedScalarString(x), that is the internal representation that ruamel.yaml uses if you specify yaml.preserve_quotes = True to distinguish single quoted input from plain/double/literal/folded scalars.