I would like to match and block address like foo.bar@gmail.com
. But it isn't that easy, since any of following:
foobar@gmail.com
fo.o....b..a..r@gmail.com
foo.bar+goo@gmail.com
fo.ob.ar+something@gmail.com
Is alias for same email account. Is it possible to create regexp that matches all possible aliases? Or do I have to normalize (remove dots and text after +
) all gmail addresses before applying filters/blacklist?
I could go with : f[.]*o[.]*o[.]*b[.]*a[.]*r[.]*(+.*)
but it looks ridiculous for longer email and probably has bad performance
One possibility would be a regex such as
f\.*o\.*o\.*b\.*a\.*r(?=.*\@gmail\.com)
This pattern basically says after any letter of foobar
there may be some unknown number of dots .
. You can always work from here on now and expand the expression to something like this
f[\.-_]*o[\.-_]*o[\.-_]*b[\.-_]*a[\.-_]*r(?=.*\@gmail\.com)
Here we also accept unknown numbers of hyphens and underscores.
Example
Here is an example in python:
# import regex
string = 'fo.o....b..a..r@gmail.com'
pattern = r'f\.*o\.*o\.*b\.*a\.*r(?=.*\@gmail\.com)'
test = regex.search(pattern, strings[0])
print(test.group(0))
# foobar