I like to use a website to test my regular expression syntax. I seem to be hitting an issue which I do not know how to solve.
Website I use for regex testing: https://regex101.com/
Here is my sample code:
import re
text = "[\xa0]\xa0National Notification Authority, [X]\xa0National Enquiry Point. Address, fax number and e-mail address (if available) of other body: \nMinistry of Agriculture, Livestock and Food Supply\nSecretariat of Trade and International Relations\nE-mail: sps@agricultura.gov.br"
results = re.findall("(?<=\: ).*",text)
print(results)
#results =[' ', ' sps@agricultura.gov.br']
However, if I use the website with the same re, it returns what I actually want - the address and any contact details.
#\nMinistry of Agriculture, Livestock and Food Supply\nSecretariat of Trade and International Relations\nE-mail: sps@agricultura.gov.br
I am not sure what is going on. Is there a way to capture both the address and contact details?
By default, a dot .
matches any character except a line break. There is a line break in your text after the colon: other body: \nMinistry
. If you want the dot to match any character whatsoever, you must instruct findall
accordingly:
re.findall("(?<=\: ).*", text, flags=re.DOTALL) # Note the flags!