Search code examples
regexstringregex-lookaroundsregex-group

Python Regex to extract email


I am trying to use python to extract the sentence that contains the email.

sample_str = "This is a random sentence. This one is also random sentence but it contains an email address [email protected]"

All the examples I see extract the email, example:

import re
lst = re.findall('\S+@\S+', sample_str) 

But is there anyway to extract the sentence that contains the email. in this case

op = "This one is also random sentence but it contains an email address [email protected]"

Solution

  • You can indicate where a sentence starts, and in between do not match the end of a sentence.

    But this can be tricky, and is definitely not a general solution as a sentence does not have to start with a char [A-Z] and might end with a different char than . ! ?

    As an idea for the given example, you might use:

    (?<!\S)[A-Z](?:(?![!?.](?!\S)).)*[^\s@]+@[^\s@]+
    

    Explanation

    • (?<!\S) Assert a whitespace boundary to the left
    • [A-Z] Match a char A-Z
    • (?:(?![!?.](?!\S)).)* Match any char, except for a ! ? or . directly followed by a whitespace boundary
    • [^\s@]+@[^\s@]+ Match an email like format

    Regex demo | Python demo

    Example

    import re
     
    sample_str = "This is a random sentence. This one is also random sentence but it contains an email address [email protected]"
    lst = re.findall('(?<!\S)[A-Z](?:(?![!?.](?!\S)).)*[^\s@]+@[^\s@]+', sample_str) 
     
    print(lst)
    

    Output

    ['This one is also random sentence but it contains an email address [email protected]']