I am trying to understand how regular expression was interpreted in the following case
import re
pattern = 'word\\\n'
sentence = 'This is a word\n.'
match = re.search(pattern, sentence)
match.group()
It successfully matches 'word\n'
.
From my understanding according to this post, pattern will firstly be interpreted as 'word'+ backslash +'newline'
, and then re.search
will search for the pattern of above in the sentence. However, 'word\n'
in the sentence shouldn't match the pattern above because 'word\n'
is interpreted as 'word'+'newline'
.
Can anyone please help me understand the nuance?
I have tried to search the results online but couldn't find anything specific to this problem.
Your regex pattern is word\(line break)
, i.e. "word" followed by a backslash followed by a line break. The backslash is used to escape meta characters or start special sequences. A literal line break is not a meta character, and the sequence backslash + line break has no special meaning, thus the backslash before it has no meaning. In other words, you can put a backslash before any character you wish; if the character isn't a meta character and the combination of backslash + character has no special meaning like \n
, then the backslash simply means nothing and is disregarded.
Thus, your regex is equivalent to word(line break)
, which matches the string 'word\n'
, because it's "word" followed by a line break.