Search code examples
pythonregexyamlpyyaml

How to get r'\\\|' from a yaml file


I am using a yaml file to store some config data, including some (many) regex strings that I don't want to keep in code. Everything works except when I try to search for some incorrectly escaped pipe characters with r'\\\|'. I tried quoted, unquoted and literal strings in yaml, nothing works. Yaml and Python string escape rules together seem to conspire to keep the number of backslashes in a string even. I open and load the file with

f=open(file_path, 'r', encoding='utf_8')
python_dict = yaml.load(f)

I'd like to reproduce

re.compile('\\\|')

using something like

re.compile(python_dict['escaped-pipes'])

Python 3.4 with PyYAML.


Solution

  • In YAML, \ is a special character only in doubly-quoted strings. The Python string r'\\\|' is a raw string, so it consists of three backslashes and a pipe. You have the following options to encode this in a YAML document:

    plain:  \\\|      # plain scalar. YAML does not process the backslashes.
    single: '\\\|'    # single quoted scalar. YAML also does not process the backslashes.
    double: "\\\\\\|" # double quoted scalar. YAML does process escape sequences here, so you
                      # need to double the backslashes
    

    Single-quoted scalars are kind-of the YAML equivalent of Python's raw strings.

    You may have other characters in your regexes that may interfere with YAML's syntax. You may want to use block scalars instead, which treat all characters as content and are ended via indentation:

    block: |-
      \\\|
    next block: |-
      \\\|
    

    | starts a literal block scalar, - causes the final linebreak (before the next item) to not be part of the scalar, which is what you want.