Search code examples
regexpcre

How to match and blacklist GMail address by regexp


I would like to match and block address like foo.bar@gmail.com. But it isn't that easy, since any of following:

foobar@gmail.com
fo.o....b..a..r@gmail.com
foo.bar+goo@gmail.com
fo.ob.ar+something@gmail.com

Is alias for same email account. Is it possible to create regexp that matches all possible aliases? Or do I have to normalize (remove dots and text after +) all gmail addresses before applying filters/blacklist?

I could go with : f[.]*o[.]*o[.]*b[.]*a[.]*r[.]*(+.*) but it looks ridiculous for longer email and probably has bad performance


Solution

  • One possibility would be a regex such as

    f\.*o\.*o\.*b\.*a\.*r(?=.*\@gmail\.com) 
    

    This pattern basically says after any letter of foobar there may be some unknown number of dots .. You can always work from here on now and expand the expression to something like this

    f[\.-_]*o[\.-_]*o[\.-_]*b[\.-_]*a[\.-_]*r(?=.*\@gmail\.com)
    

    Here we also accept unknown numbers of hyphens and underscores.

    Example

    Here is an example in python:

    # import regex
    
    string = 'fo.o....b..a..r@gmail.com'
    pattern = r'f\.*o\.*o\.*b\.*a\.*r(?=.*\@gmail\.com)'
    test = regex.search(pattern, strings[0])
    print(test.group(0))
    # foobar