Search code examples
javaregexkotlinregex-lookaroundsnegative-lookbehind

Java regex issue (negative lookahead & lookbehind)


I need your help guys!! This is a tricky java regex issue, been search for a solution for a couple hours... Here it is:

In the following text, I want to match the "boat" word...

  1. and include "bunch of " if place just before it.
  2. and include " propeller" if place just after it.
  3. or don't match if preceeded by "for a " even with "bunch of " in between.
  4. or don't match if followed by " trailer" even with "propeller " in between.

I have a boat to sell. It comes with extra boat propellers but does not come with a boat trailer (the boat is pretty big so you might need a boat propeller trailer too). I used to have a bunch of boats but my passion for a boat faded with time. I did not think people would have interest for a bunch of boats but this is my last one, so Yeéé! :)

The following parts should match:

  • boat ("boat")
  • bunch of boats ("boat" preceeded by "bunch of ")
  • boat propeller ("boat followed by " propeller")

The following parts should NOT match (not even partially):

  • for a boat ("boat" preceeded by "for a ")
  • boat trailer ("boat followed by " trailer")
  • for a bunch of boats ("boat" preceeded by "bunch of " which is preceeded by "for a ")
  • boat propeller trailer ("boat" followed by " propeller" which is followed by " trailer")

I got this example setup in regex 101 ( https://regex101.com/r/o6S4SP/22 ) but it's not working properly :-(

PS: I'm using Regex101 for the example but "(SKIP)(FAIL)" is not supported in Java's regex syntax.

Hope anyone could help :-)


Solution

  • You may use the following regex in Java that features a constrained-width lookbehind pattern (supporting limiting quantifiers):

    (?<!\bfor\sa\s(?:bunch\sof\s){0,1})(?:\bbunch\s+of\s+)?\bboats?\b(?:\s+propellers?)?+(?!\s+trailers?\b)
    

    See the Java regex demo online (proof).

    In Java,

    s = s.replaceAll("(?<!\\bfor\\sa\\s(?:bunch\\sof\\s){0,1})(?:\\bbunch\\s+of\\s+)?\\bboats?\\b(?:\\s+propellers?)?+(?!\\s+trailers?\\b)", "<b>$0</b>");
    

    Regex details

    • (?<!\bfor\sa\s(?:bunch\sof\s){0,1}) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is
      • \bfor\sa\s - for, whitespace, a, whitespace
      • (?:bunch\sof\s){0,1} - 0 or 1 occurrences (i.e. an optional occurrence) of bunch, whitespace, of, whitespace
    • (?:\bbunch\s+of\s+)? - an optional occurrence of bunch, 1+ whitespaces, of, 1+ whitespaces
    • \bboats?\b - a whole word boat or boats
    • (?:\s+propellers?)?+ - an optional occurrence of 1+ whitespaces followed with propeller or propellers. NOTE: the ?+ possessive quantifier is key here to make the next lookahead only execute after this group pattern.
    • (?!\s+trailers?\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there is 1+ whitespaces, and then trailer or trailers as a whole word.