I have lots of text files full of newlines which I am parsing in python 3.4. I am looking for the newlines because they separate my text into different parts. Here is an example of a text :
text = 'avocat ;\n\n m. x'
I naïvely started looking for newlines with '\n
' in my regular expression (RE) without thinking that the backslash '\
' was an escape character. Howerver, this turned out to work fine:
>>> import re
>>> pattern1 = '\n\n'
>>> re.findall(pattern1, text)
['\n\n']
Then, I understood I should be using a double backslash in order to look for one backlash. This also worked fine:
>>> pattern2 = '\\n\\n'
>>> re.findall(pattern2, text)
['\n\n']
But on another thread, I was told to use raw strings instead of regular strings, but this format fails to find the newlines I am looking for:
>>> pattern3 = r'\\n\\n'
>>> pattern3
'\\\\n\\\\n'
>>> re.findall(pattern3, text)
[]
Could you please help me out here ? I am getting a little confused of what king of RE I should be using in order to correctly match the newlines.
Don't double the backslash when using raw string:
>>> pattern3 = r'\n\n'
>>> pattern3
'\\n\\n'
>>> re.findall(pattern3, text)
['\n\n']