I did a question about punctuation and regex, but it was confusing.
Supossing I have this text:
String text = "wor.d1, :word2. wo,rd3? word4!";
I'm doing this:
String parts[] = text.split(" ");
And I have this:
wor.d1, | :word2. | wor,d3? | word4!;
What I need to do to have this? (Keep the the symbols at the borders, but only that I specify: .,!?:
, not all).
wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !
I'm getting some good results with these regex, but it's giving an empty char before all splits on punctuation at start of a word.
There is a way to not have this empty char at the start?
Is this regex is good, or there is a more simple way?
public static final String PUNCTUATION_SEPARATOR =
"("
+ "("
+ "(?=^[\"'!?.,;:(){}\\[\\]]+)"
+ "|"
+ "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
+ ")"
+ "|"
+ "("
+ "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ "|"
+ "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ ")"
+ ")";
public static final String PUNCTUATION_SEPARATOR =
"("
+ "("
+ "(?=^[\"'!?.,;:(){}\\[\\]-]+)"
+ "|"
+ "(?<=^[\"'!?.,;:(){}\\[\\]-]+)"
+ ")"
+ "|"
+ "("
+ "(?=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
+ "|"
+ "(?<=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
+ ")"
+ ")";