I am trying to use python to extract the sentence that contains the email
.
sample_str = "This is a random sentence. This one is also random sentence but it contains an email address [email protected]"
All the examples I see extract the email, example:
import re
lst = re.findall('\S+@\S+', sample_str)
But is there anyway to extract the sentence that contains the email
. in this case
op = "This one is also random sentence but it contains an email address [email protected]"
You can indicate where a sentence starts, and in between do not match the end of a sentence.
But this can be tricky, and is definitely not a general solution as a sentence does not have to start with a char [A-Z]
and might end with a different char than .
!
?
As an idea for the given example, you might use:
(?<!\S)[A-Z](?:(?![!?.](?!\S)).)*[^\s@]+@[^\s@]+
Explanation
(?<!\S)
Assert a whitespace boundary to the left[A-Z]
Match a char A-Z(?:(?![!?.](?!\S)).)*
Match any char, except for a !
?
or .
directly followed by a whitespace boundary[^\s@]+@[^\s@]+
Match an email like formatExample
import re
sample_str = "This is a random sentence. This one is also random sentence but it contains an email address [email protected]"
lst = re.findall('(?<!\S)[A-Z](?:(?![!?.](?!\S)).)*[^\s@]+@[^\s@]+', sample_str)
print(lst)
Output
['This one is also random sentence but it contains an email address [email protected]']