Search code examples
excelvbaexcel-formulaexport-to-excel

Extract email id of specific domain extensions


I need to extract email id from each row of specific domain extensions like .com .net .org everything else should be ignored. Below is the sample data of two rows.

    [email protected],[email protected],.@.,[email protected],.@.,[email protected],[email protected],[email protected],null.val@.@.,.@@,[email protected],[email protected],[email protected]
[email protected],[email protected],.@.,[email protected],.@.,[email protected],[email protected],[email protected],null.val@.@.,.@@,[email protected],[email protected],[email protected]

Whatever the first valid extension email matches is enough even though there are multiple id's only one email id is enough per row. Below is the sample desired result.

Sample data image

I believe this can be done with custom formula with regex but I can't wrap my head around it. I am using Desktop MS Excel latest version.


Solution

  • If your email addresses are relatively simple, you can use this regex:

    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b
    

    In VBA:

    Option Explicit
    Function extrEmail(S As String) As String
        Dim RE As Object, MC As Object
        Const sPat As String = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b"
        
    Set RE = CreateObject("vbscript.regexp")
    With RE
        .Pattern = sPat
        .ignorecase = True
        .Global = False
        .MultiLine = True
        If .test(S) = True Then
            Set MC = .Execute(S)
            extrEmail = MC(0)
        End If
    End With
    End Function
    
    

    Matching an email address can become very complicated, and a regex that follows all the rules is extraordinarily complex and long. But this one is relatively simple, and might work for your needs.

    Explanation of Regex

    Emailaddress1

    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b
    

    Options: Case insensitive; ^$ match at line breaks

    Created with RegexBuddy

    enter image description here

    EDIT: To match only specific domains, merely replace the part of the regex that matches domains with a group of pipe-separated domain names.

    eg

    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:com|net|org)\b