Search code examples
javaregexregex-lookaroundscapturing-group

REGEXP: capture group NOT followed by


I need to match following statements:

Hi there John
Hi there John Doe (jdo)

Without matching these:

Hi there John Doe is here 
Hi there John is here

So I figured that this regexp would work:

^Hi there (.*)(?! is here)$

But it does not - and I am not sure why - I believe this may be caused by the capturing group (.*) so i thought that maybe making * operator lazy would solve the problem... but no. This regexp doesn't work too:

^Hi there (.*?)(?! is here)$

Can anyone point me in the solutions direction?

Solution

To retrieve sentence without is here at the end (like Hi there John Doe (the second)) you should use (author @Thorbear):

^Hi there (.*$)(?<! is here)

And for sentence that contains some data in the middle (like Hi there John Doe (the second) is here, John Doe (the second) being the desired data)simple grouping would suffice:

^Hi there (.*?) is here$

.

           ╔══════════════════════════════════════════╗
           ║▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒║
           ║▒▒▒Everyone, thank you for your replies▒▒▒║
           ║▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒║
           ╚══════════════════════════════════════════╝

Solution

  • the .* will find a match regardless of being greedy, because at the end of the line, there is no following is here (naturally).

    A solution to this could be to use lookbehind instead (checking from the end of the line, if the past couple of characters matches with is here).

    ^Hi there (.*)(?<! is here)$

    Edit

    As suggested by Alan Moore, further changing the pattern to ^Hi there (.*$)(?<! is here) will increase the performance of the pattern because the capturing group will then gobble up the rest of the string before attempting the lookbehind, thus saving you of unnecessary backtracking.