Search code examples
javaregexlookbehind

Java regex lookbehind issue with quantifiers


I'm using a Java regex pattern in an application that only allows access to the whole match value (that is, I cannot use capturing groups).

I am trying to extract values from my sample text:

C02 SURVEY  : 2010 F10446P BONAPARTE 2D 

In the above example I need to check for the keyword SURVEY and have to extract value after that :. And I wanted my output to be:

2010 F10446P BONAPARTE 2D

I used the pattern (?<=(?i)survey\s{2}[:])(?:(?![\n]).)*

In this pattern, I have hardcoded the spaces to be 2 (\s{2}) which may vary and not constant value.

I need to use quantifiers with lookbehind operation.

If any other option is there please let me know.


Solution

  • You may leverage a feature in a Java regex engine that is called "constrained width lookbehind":

    Java accepts quantifiers within lookbehind, as long as the length of the matching strings falls within a pre-determined range. For instance, (?<=cats?) is valid because it can only match strings of three or four characters. Likewise, (?<=A{1,10}) is valid.

    That means, you may replace the {2} limiting quantifier with a limiting quantifier with both minimum and maximum values, e.g. {0,100} to allow zero to a hundred whitespace symbols. Adjust them as you see fit.

    Besides, you needn't use a tempered greedy token (?:(?![\n]).)* as the dot in Java regex does not match a newline. Just replace it with .* to match any zero or more chars other than newline. So, your pattern might look as simple as (?i)(?<=survey\s{0,100}:).*.