Search code examples
phpregexnegative-lookbehind

Problem with negative lookbehind regex capturing


I try to match email addresses but only when they are not preceeded with "mailto:". I try this regular expression:

"/(?<!mailto:)[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/"

against this string: '<a href="mailto:[email protected]">EMAIL</a> ... [email protected] '

I would expect to catch only '[email protected]', but I also receive '[email protected]' - see missing 's'. I wonder what's wrong here. Can't I have a normal regex after the lookbehind assertion?

My whole example in PHP looks like:

$testString = '<a href="mailto:[email protected]">EMAIL</a>  ...   [email protected] ';
$pattern = "/(?<!mailto:)[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/";
preg_match_all($pattern, $testString, $matches);
echo('<pre>');print_r($matches);echo('</pre>');

Thank you!


Solution

  • Because after s there is a string that matches your regex, [email protected], and because s is hardly mailto: it matches. Getting a word boundary in there will work for most cases:

    Change:

    (?<!mailto:)
    

    To:

    (?<!mailto:)\b
    

    On a side note: use example.com for examples, domain.com is owned by an actual company.