Search code examples
regexregex-lookaroundslookbehind

lookbehind alternation in Java seems to be returning the longest result


I'm looking to extract (using Java's built in Regex at the moment) text after a range of suffixes. I'm using the lookbehind technique but the result I get always seems to be the longest result rather than the match of the first alternation group to match the prefix text.

That is,

(?<=Book name|Book).*

Given the text "Book name Story"

The match is always "name Story" regardless of which way round the regex alternation is. My question here is what is the best way to get just the "Story" text without match any of the other text? In practice I'm hoping to limit the right hand side too with a lookahead(just in case that's pertinent).


Solution

  • You could use a lookahead here.

    (?<=Book name |Book )\S+(?=$)
    

    OR

    (?<=Book name )\S+|(?<=Book )(?!name)\S+
    

    Java regex would be,

    "(?<=Book name |Book )\\S+(?=$)"
    

    OR

    "(?<=Book name )\\S+|(?<=Book )(?!name)\\S+"
    

    DEMO 1

    DEMO 2

    Code:

    String s = "Book name Story";
    Pattern regex = Pattern.compile("(?<=Book name |Book )\\S+(?=$)");
    Matcher regexMatcher = regex.matcher(s);
    if (regexMatcher.find()) {
     String ResultString = regexMatcher.group();
     System.out.println(ResultString);
    }//=> Story
    

    Explanation:

    • (?<=Book name |Book ) Looknbehind sets the matching marker just after to the string Book name or Book.
    • \S+ Matches one or more non-space characters.
    • (?=$) Lookahead asserts what following must be a line end.