Search code examples
pythonregexescaping

Confusion about backslash in regular expression


I am trying to understand how regular expression was interpreted in the following case

import re
pattern = 'word\\\n'
sentence = 'This is a word\n.'
match = re.search(pattern, sentence)
match.group()

It successfully matches 'word\n'.

From my understanding according to this post, pattern will firstly be interpreted as 'word'+ backslash +'newline', and then re.search will search for the pattern of above in the sentence. However, 'word\n' in the sentence shouldn't match the pattern above because 'word\n' is interpreted as 'word'+'newline'.

Can anyone please help me understand the nuance?

I have tried to search the results online but couldn't find anything specific to this problem.


Solution

  • Your regex pattern is word\(line break), i.e. "word" followed by a backslash followed by a line break. The backslash is used to escape meta characters or start special sequences. A literal line break is not a meta character, and the sequence backslash + line break has no special meaning, thus the backslash before it has no meaning. In other words, you can put a backslash before any character you wish; if the character isn't a meta character and the combination of backslash + character has no special meaning like \n, then the backslash simply means nothing and is disregarded.

    Thus, your regex is equivalent to word(line break), which matches the string 'word\n', because it's "word" followed by a line break.