Search code examples
pythonpython-re

Regular expressions, different results from code and test website


I like to use a website to test my regular expression syntax. I seem to be hitting an issue which I do not know how to solve.

Website I use for regex testing: https://regex101.com/

Here is my sample code:

import re

text = "[\xa0]\xa0National Notification Authority, [X]\xa0National Enquiry Point. Address, fax number and e-mail address (if available) of other body: \nMinistry of Agriculture, Livestock and Food Supply\nSecretariat of Trade and International Relations\nE-mail: sps@agricultura.gov.br"

results = re.findall("(?<=\: ).*",text)
print(results)

#results =[' ', ' sps@agricultura.gov.br']

However, if I use the website with the same re, it returns what I actually want - the address and any contact details.

#\nMinistry of Agriculture, Livestock and Food Supply\nSecretariat of Trade and International Relations\nE-mail: sps@agricultura.gov.br

I am not sure what is going on. Is there a way to capture both the address and contact details?


Solution

  • By default, a dot . matches any character except a line break. There is a line break in your text after the colon: other body: \nMinistry. If you want the dot to match any character whatsoever, you must instruct findall accordingly:

    re.findall("(?<=\: ).*", text, flags=re.DOTALL) # Note the flags!