Search code examples
javaregexregex-lookarounds

How to split a List of Strings in Java with a regex that uses the last occurrence of given Pattern?


I'm pretty new to the regex world. Given a list of Strings as input, I would like to split them by using a regex of punctuations pattern: "[!.?\n]".

The thing is, I would like to specify that if there are multiple punctuations together like this:

input: "I want it now!!!"

output: "I want it now!!"

input: "Am I ok? Yeah, I'm fine!!!"

output: ["Am I ok", "Yeah, I'm fine!!"]


Solution

  • You can use

    [!.?\n](?![!.?\n])
    

    Here, a !, ., ? or newline are matched only if not followed with any of these chars.

    Or, if the char must be repeated:

    ([!.?\n])(?!\1)
    

    Here, a !, ., ? or newline are matched only if not followed with exactly the same char.

    See the regex demo #1 and the regex demo #2.

    See a Java demo:

    String p = "[!.?\n](?![!.?\n])";
    String p2 = "([!.?\n])(?!\\1)";
    String s = "I want it now!!!";
    System.out.println(Arrays.toString(s.split(p)));  // => [I want it now!!]
    System.out.println(Arrays.toString(s.split(p2))); // => [I want it now!!]
    s = "Am I ok? Yeah, I'm fine!!!";
    System.out.println(Arrays.toString(s.split(p)));  // => [Am I ok,  Yeah, I'm fine!!]
    System.out.println(Arrays.toString(s.split(p2))); // => [Am I ok,  Yeah, I'm fine!!]