I have a very long text and I try to break it after every 3 sentences.
Example
Source:
"Sentence 1. Sentence 2? Sentence 3! Sentence 4. Sentence 5. Sentence 6. Sentence 7. Sentence 8. Sentence 9. Sentence 10."
Should return:
"Sentence 1. Sentence 2? Sentence 3!
Sentence 4. Sentence 5. Sentence 6.
Sentence 7. Sentence 8. Sentence 9.
Sentence 10."
At the moment I've the regex (?<=[\.?!])\s
which matches all the whitespaces between the sentences. So I could use it to split the String and then iterates to add the line break like that:
String[] splits = src.split(regex);
StringBuilder b = new StringBuilder();
int index = 0;
for (String s : splits) {
if (index == 3) {
b.append("\n");
index = 0;
} else if (index > 0) {
b.append(" ");
}
b.append(s);
index++;
}
String res = b.toString();
But I'd like to do it automatically using:
src.replaceAll(regex2, "\n");
Any idea of how I can achieve that?
You may use the following regex substitution:
s = s.replaceAll("(?s)(.*?[.?!](?:\\s.*?[.?!]){0,2})\\s*", "$1\n");
See the regex demo
Details
(?s)
- a DOTALL modifier (.
matches line break chars now)(.*?[.?!](?:\s.*?[.?!]){0,2})
- Group 1:
.*?[.?!]
- any 0+ chars, as few as possible, up to the leftmost .
, ?
or !
followed with(?:\s.*?[.?!]){0,2}
- 0 to 2 sequences of
\s
- a whitespace.*?[.?!]
- any 0+ chars, as few as possible, up to the leftmost .
, ?
or !
\s+
- 1 or more whitespacesThe $1\n
replacement takes the whole match except the last whitespaces, and appends the newline at the end.