Search code examples
phpregexstringparsingtext-extraction

Extract the email addresses from a block of text containing strictly formatted, delimited values


I have an ICS file that will be uploaded in my server when a meeting created in google calendar, yahoo calendar, etc.... I have parsed date, organizer, etc,. from the ics file. But i can't able to get the attendees list. Below is the code will be in the ICS file.

BEGIN:VEVENT

ATTENDEE;RSVP=TRUE:mailto:xxxxxxx

  [email protected]
ATTENDEE;RSVP=TRUE:mailto:[email protected]

ATTENDEE;RSVP=TRUE:mailto:[email protected]

ATTENDEE;RSVP=TRUE:mailto:[email protected]

CLASS:PUBLIC

From the above code, I need the email ID's associated with mailto parameter. Please help me to achieve this.

<?php
$cal = file_get_contents("ics_files/outlook.ics");
$cal = str_replace("\n", "", $cal);
preg_match_all('/mailto:(.*?)ATTENDEE/', $cal, $attendees);
?>

Solution

  • If you remove the pre-formatting line that removes newlines (\n) from the ics data, a straightforward regex can be used:

    /mailto:(.*?)(?:ATTENDEE;|CLASS:)/s
    

    The /s tells the regex-engine to match newline characters with the .. If you wanted to drop the /s, you could instead use:

    /mailto:((?:\r\n|\n|.)*?)(?:ATTENDEE;|CLASS:)/
    

    Using PHP's preg_match_all():

    preg_match_all('/mailto:(.*?)(?:ATTENDEE;|CLASS:)/s', $cal, $attendees);
    

    The output:

    print_r($attendees[1]);
    
    Array (
        [0] => xxxxxxx
    
      [email protected]
        [1] => [email protected]
        [2] => [email protected]
        [3] => [email protected]
    )
    

    You can then iterate over the $attendees[1] array and apply any email-address logic / formatting you wish.

    Example:

    foreach ($attendees[1] as $attendee) {
        // remove any extra spaces/newlines from the address
        $attendee = trim(preg_replace('/\s\s+/', ' ', str_replace("\n", ' ', $attendee)));
    
        // split the address into any available name/email-address combination
        $address = explode(' ', $attendee);
    
        echo $address[0];
        if (!empty($address[1])) {
            // there is a name/email-address combination available
            echo ' <' . $address[1] . '>';
        }
        echo "\n";
    }
    

    Output:

    xxxxxxx <[email protected]>
    [email protected]
    [email protected]
    [email protected]