Search code examples
javaregexregex-lookaroundslookbehind

How Can I Use Look-Ahead and Look-Behind to Create a Custom Boundary Matcher?


I want to split a String at the word boundaries using Scanner. Normally, this would be done like this:

Scanner scanner = new Scanner(...).useDelimiter("\\b");

The problem is that my definition of "word" character is a tiny bit different from the standard [a-zA-Z_0-9] as I want to include some more characters and exclude the _: [a-zA-Z0-9#/]. Therefore, I can't use the \b pattern.

So I tried to do the same thing using look-ahead and look-behind, but what I came up with didn't work:

(<?=[A-Za-z0-9#/])(?![A-Za-z0-9#/])|(<?![A-Za-z0-9#/])(?=[A-Za-z0-9#/])

The scanner doesn't split anywhere using this.

Is it possible to do this using look-ahead and look-behind and how?


Solution

  • There's an error in your syntax. The ? comes first:

    (?<=[A-Za-z0-9#/])(?![A-Za-z0-9#/])|(?<![A-Za-z0-9#/])(?=[A-Za-z0-9#/])
     ^^                                  ^^