Search code examples
phpstringemailpreg-matchnames

PHP split emails string with commas in names sometimes


I have some old legacy data which contains email addresses in strings like so:

$str = 'Joe Bloggs <[email protected]>, Person, Test [[email protected]], [email protected]'

I'd like to split this string into the 3 emails contained within, but you can see some names have the comma delimiter in them, and some emails do not have the RFC specification name at the start. Ideally, the string above would be split into the following array:

Array (
    [0] => Array(
        'name' => 'Joe Blogs',
        'email' => '[email protected]'
    )
    [1] => Array(
        'name' => 'Person, Test',
        'email' => '[email protected]'
    ),
    [2] => Array(
        'name' => '',
        'email' => '[email protected]'
    )
)

I'm guessing regex would work here? I've come up with the following, but it only handles a single email address, not a comma separated list (with commas in the name, too!):

preg_match_all('!(.*?)\s?[<|\[]\s*(.*?)\s*[>|\]]!',$string,$matches);

Thank you!


Solution

  • You may use

    (?:,\s*)?(.*?)\s*(?|<([^>]*)>|\[([^][]*)]|(\S+@\S+))
    

    See the regex demo

    Details

    • (?:,\s*)? - an optional sequence of , and then 0+ whitespaces
    • (.*?) - Group 1 (name): any 0+ chars other than line break chars as few as possible
    • \s* - 0+ whitespaces
    • (?|<([^>]*)>|\[([^][]*)]|(\S+@\S+)) - a branch reset group matching
      • <([^>]*)>| - <, then any 0+ chars other than > are captured in Group 1 and the > is just matched
      • \[([^][]*)]| - [, then any 0+ chars other than ] are captured in Group 1 and the ] is just matched
      • (\S+@\S+) - 1 or more non-whitespace chars, @, and again 1+ non-whitespace chars are captured in Group 1.

    And then use the following PHP code to obtain the necessary results:

    $re = '/(?:,\s*)?(.*?)\s*(?|<([^>]*)>|\[([^][]*)]|(\S+@\S+))/';
    $str = 'Joe Bloggs <[email protected]>, Person, Test [[email protected]], [email protected]';
    preg_match_all($re, $str, $m, PREG_SET_ORDER, 0);
    $res = array();
    foreach ($m as $e)
    {   
        $res[] = array('name' => $e[1], 'address' => $e[2]);
    }
    print_r($res);
    

    Output:

    Array
    (
        [0] => Array
            (
                [name] => Joe Bloggs
                [address] => [email protected]
            )
    
        [1] => Array
            (
                [name] => Person, Test
                [address] => [email protected]
            )
    
        [2] => Array
            (
                [name] => 
                [address] => [email protected]
            )
    
    )