Search code examples
regexregex-lookaroundsregex-group

Regex positive lookahead not matching as expected


I have to use a regular expression to match several strings and capture parts of the string.

Example strings could look like:


  • robert eric palmer sent for the boat
  • robert eric william palmer sent for the boat

The goal is to lazy match and capture the middle name of robert palmer up to the point where the surname (palmer) appears in the string AND ensure the rest of the string matches the static text (robert ___ palmer sent for the boat).

I have used a positive lookahead to find the middle name and stop matching if palmer is found:

/robert (.+?)(?=\spalmer) palmer/

which correctly matches;

robert eric palmer

robert eric william palmer

and correctly doesn't match;

robert eric william palmer palmer


The problem:

when I add the rest of the static text to the regex;

/robert (.+?)(?=\spalmer) palmer sent for the boat/

it incorrectly matches;

robert eric william palmer palmer sent for the boat
robert eric palmer palmer sent for the boat

How can I lazy match up to palmer for the middle name and still assert the rest of the static text matches?

I hope this makes sense!


Solution

  • You may use

    robert ((?:(?!palmer).)+?) palmer sent for the boat
    

    See the regex demo.

    Details

    • robert - a literal substring
    • ((?:(?!palmer).)+?) - a capturing group #1 with a tempered greedy token that matches any char (.), 1 or more occurrences but as few as possible, that does not start a palmer char sequence
    • palmer sent for the boat - a literal substring.

    To unroll the pattern for better performance use

    robert ([^p]*(?:p(?!almer)[^p]*)*) palmer sent for the boat
    

    See this regex demo.