Search code examples
pythonemaildata-cleaning

Easiest way to clean email address in python


I am having issues with emails address and with a small correction, they are can be converted to valid email addresses.

For Ex:

%20adi@gmail.com, --- Not valid
'sam@tell.net,  --- Not valid
(hi@telligen.com),  --- Not valid
(gii@weerte.com),  --- Not valid
:qwert34@embright.com,  --- Not valid
//24adifrmaes@microsot.com  --- Not valid
tellei@apple.com    ---  valid
...

I could write "if else", but if a new email address comes with new issues, I need to write "ifelse " and update every time.

What is the best way to clean all these small issues, some python packes or regex? PLease suggest.


Solution

  • You can do this (I basically check if the elements in the email are alpha characters or a point, and remove them if not so):

    emails = [
        'sam@tell.net', 
        '(hi@telligen.com)', 
        '(gii@weerte.com)',  
        ':qwert34@embright.com',  
        '//24adifrmaes@microsot.com',
        'tellei@apple.com'
        ]
    
    def correct_email_format(email):
        return ''.join(e for e in email if (e.isalnum() or e in ['.', '@']))
    
    for email in emails:
        corrected_email = correct_email_format(email)
        print(corrected_email)
    

    output:

    sam@tell.net
    hi@telligen.com
    gii@weerte.com
    qwert34@embright.com
    24adifrmaes@microsot.com
    tellei@apple.com