Search code examples
pythonrawstring

Create and parse a Python Raw string literal R""


Edit
I'm not sure if this question is being read correctly.
I already know what string formats are in Python.
Every single little detail, I already know.
Please stop directing me to questions about string types in Python.

This is a specific question that has to do with the problem string delimiter
in the body of a raw syntax construction.

I want to know why I can't use the raw syntax r"" or r'' form on this
raw string "word's" and have it exist in a variable just like this.

It doesn't matter why I want to do this, but I've explained below.

Thanks.


I'm just going over a some syntax rules to parse and create
strings using the Raw String Syntax rules for r' ' and r" ".

For the record, I have read the docs and rules on raw strings.
The question is specific to escaping the delimiter within the raw string.

I have a utility that parses/makes other string types and is used
in production code.

I'm perplexed that Python does not remove the escape of the escaped delimiter when the string is in a variable.

Is this by design, ie. NOT removing the escape on the delimiter or what I am
hoping, just a missed part of the parse process.
Basically, a bug ?

The string is not really a raw image of the original if after parsing, it does
not look like the original.
After parsing, in a variable, it now becomes useless.

Is this an oversight and possibly something that will be corrected in the future?

As it is now, in my utility, I can only create a raw syntax form, but due to
this bug, I cannot parse it unless I take off the escape from the delimiter.

I mean, I guess I could do this as it is a direct inverse of making the string,
but it's disturbing that the lexical parser leaves this artificial escape in the variable after
the parsing process.

Here is some code I used to verify the problem:

Code

#python 2.7.12

print "Raw targt string test = \"word's\""

v1 = r' "word\'s" '     # => "word\'s" 
v2 = r" \"word's\" "    # => \"word's\"

print "using r' ' syntax, variable contains  " + v1
print "using r\" \" syntax, variable contains  " + v2

if len(v1) == len(v2) :
   print "length's are equal" 
else :
   print "length's are NOT equal" 

Output

Raw targt string test = "word's"
using r' ' syntax, variable contains   "word\'s" 
using r" " syntax, variable contains   \"word's\" 
length's are NOT equal

Either


Solution

  • To quote the Python FAQ, raw string literals in Python were "designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing". Since the regex engine will strip the backslash in front of the quote character, Python doesn't need to strip it. This behavior will most likely never be changed since it would severely break backwards compatibility.

    So yes, it is by design -- although it is quite confusing.

    I want to know why I can't use the raw syntax r"" or r'' form on this raw string "word's" and have it exist in a variable just like this.

    Python's raw string literals were not designed to be able to represent every possible string. In particular, the string "' cannot be represented within r"" or r''. When you use raw string literals for regex patterns, this is not a problem, since the patterns \"', "\', "', and \"\', are equivalent (that is, they all match the single string "').

    However, note that you can write the string "word's" using the triple-quoted raw string literal r'''"word's"'''.