Search code examples
javaregexseparatorpunctuation

How to keep the delimiter while using RegEx?


I did a question about punctuation and regex, but it was confusing.

Supossing I have this text:

String text = "wor.d1, :word2. wo,rd3? word4!"; 

I'm doing this:

String parts[] = text.split(" ");

And I have this:

wor.d1, | :word2. | wor,d3? | word4!;

What I need to do to have this? (Keep the the symbols at the borders, but only that I specify: .,!?:, not all).

wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !

UPDATE

I'm getting some good results with these regex, but it's giving an empty char before all splits on punctuation at start of a word.

There is a way to not have this empty char at the start?

Is this regex is good, or there is a more simple way?

public static final String PUNCTUATION_SEPARATOR =
        "("
        + "("
        + "(?=^[\"'!?.,;:(){}\\[\\]]+)"
        + "|"
        + "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
        + ")"
        + "|"
        + "("
        + "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
        + "|"
        + "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
        + ")"
        + ")";

Solution

  • public static final String PUNCTUATION_SEPARATOR =
        "("
        + "("
        + "(?=^[\"'!?.,;:(){}\\[\\]-]+)"
        + "|"
        + "(?<=^[\"'!?.,;:(){}\\[\\]-]+)"
        + ")"
        + "|"
        + "("
        + "(?=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
        + "|"
        + "(?<=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
        + ")"
        + ")";