Happens to me a rare thing when trying to do a search with regex trough a pyperclip.paste()
if the search expression involves a \n
new line character.
Excuse my English.
When the search, I make it trough this triple quote assigned to a text
variable:
import re
text = '''
This as the line 1
This as the line 2
'''
pattern = re.compile(r'\d\n\w+')
result = pattern.findall(text)
print(result)
It actually prints the new line character \n
. Which is what I want, or almost what I expect.
»»» ['1\nThis']
But the problem starts when the string to search come from a text copied from the clipboard.
This as the line 1
This as the line 2
Say I just select and copy to clipboard that text and i want regex to extract the same previous output from it. This time I need to use pyperclip module.
So, forgetting the previous code and write this instead:
import re, pyperclip
text = pyperclip.paste()
pattern = re.compile(r'\d\n\w+')
result = pattern.findall(text)
print(result)
This is the result:
»»» [ ]
Nothing but two brackets. I discover (in my inexperience) that the problem causing this is the \n
character. And it has nothing to do with a conflict between the python (also \n character), because we avoid that with 'r'.
I already found a not too clearly solution for this (for me almost, because I'm just with the basics of Python right now).
import re, pyperclip
text = pyperclip.paste()
lines = text.split('\n')
spam = ''
for i in lines:
spam = spam + i
pattern = re.compile(r'\d\r\w+')
result = pattern.findall(spam)
print(result)
Note that instead of \n
for detect new lines in the last regex expression, I opted to \r
(\n
would cause the same bad behavior printing only brackets).
\r
its exchangeable with \s
, the output works, but:
»»» ['1\rThis']
With \r
instead of \n
At least it was a little victory for me.
It'll helps me a lot if you could explain to me a better solution for this o almost understand why this happened. You also can recommend me some concepts to investigate to, for a fully comprehension of this.
The reason you are getting the \r
when pasting is because you are pasting from a Windows machine. On windows, the newline characters are represented by \r\n
. Note that \s
is different from \r
. \s
means any whitespace characters. \r
is only the carriage return character.
The text:
This as the line 1
This as the line 2
actually looks like:
This as the line 1\r\n
This as the line 2\r\n
on a windows machine.
In the regex, the\d\r
matches to end of the first line: 1\r
but then the \w+
doesn't match the \n
. You need to edit your first regex to be:
pattern = re.compile(r'\d\r\n\w+')