Search code examples
javaregexregex-lookarounds

Regex that matches before and after certain characters


I am trying to craft a delimiter regex (for use with java.util.Scanner) that segments a string on whitespace, as well as keeping colons, opening parenthesis and closing parenthesis as separate tokens. That is, foo(a:b) should segment into the tokens foo, (, a, :, b and ).

My current best effort is the pattern "\\s+|(?=[(:])|(?<=[:)])" which for some reason I can't understand fails to match after the opening parenthesis and before the closing parenthesis, but matches fine on both sides of the colon.


Solution

  • If you want all those separate parts, you could extend the character classes asserting one of the characters [(:)] at the left and, if this is the whole string, assert one of the characters [(:] at the right.

    If you also want to match the position after the last closing parenthesis, both character classes can be the same [(:)]

    \s+|(?=[(:)])|(?<=[(:])
    

    Regex demo | Java demo

    Example code

    String s = "foo(a:b)";
    Scanner scanner = new Scanner(s).useDelimiter("\\s+|(?=[(:)])|(?<=[(:])");
    while(scanner.hasNext())
    {
        System.out.println(scanner.next());
    }
    

    Output

    foo
    (
    a
    :
    b
    )