Search code examples
pythonregexpython-3.xstring-literalsrawstring

Why does regular expression understand \n without r prefix?


I have been reading a lot of questions to find the answer and sorry if i missed it.

Let's say I have a text containing only a new line character.
text ='\n'

Because Regular Expression use the backslash character ('\') to escape special meaning characters like Python, we would match the new line character by using raw string notation just like this answer suggested. (Please do correct me if i am wrong)

So we would do regex = re.compile(r'\n'), and the regex parser could read a backslash and a character 'n' and interpret it as new line character.

My question is why does regex = re.compile('\n') also work too?

I tried to do regex.match(text) and the result is <_sre.SRE_Match object; span=(0, 1), match='\n'>, which is the same with raw string notation.


Is it because of the document written in here? which says:

Most of the standard escapes supported by Python string literals are also accepted by the regular expression parser: \a \b \f \n \r \t \v \x \\

Could someone explain in details?


Solution

  • The r'\n' suppresses the interpretation of the string literal. This way, it contains two characters '\' and 'n'. The two characters are interpreted by the regular expression engine as newline sequence. In the second case, the '\n' is first converted to the newline sequence (that is LF on Unix-based system, that is one character; or to CR LF on Windows, that is two characters,...). The regular expression compiler takes it as explicitly given characters (no backslash, no special interpretation).