Search code examples
javaandroidregexreplaceall

Break a String after every x sentences


I have a very long text and I try to break it after every 3 sentences.

Example

Source:

"Sentence 1. Sentence 2? Sentence 3! Sentence 4. Sentence 5. Sentence 6. Sentence 7. Sentence 8. Sentence 9. Sentence 10."

Should return:

"Sentence 1. Sentence 2? Sentence 3! Sentence 4. Sentence 5. Sentence 6. Sentence 7. Sentence 8. Sentence 9. Sentence 10."

At the moment I've the regex (?<=[\.?!])\s which matches all the whitespaces between the sentences. So I could use it to split the String and then iterates to add the line break like that:

String[] splits = src.split(regex);
StringBuilder b = new StringBuilder();
int index = 0;
for (String s : splits) {
    if (index == 3) {
        b.append("\n");
        index = 0;
    } else if (index > 0) {
        b.append(" ");
    }

    b.append(s);
    index++;
}
String res = b.toString();

But I'd like to do it automatically using:

src.replaceAll(regex2, "\n");

Any idea of how I can achieve that?


Solution

  • You may use the following regex substitution:

    s = s.replaceAll("(?s)(.*?[.?!](?:\\s.*?[.?!]){0,2})\\s*", "$1\n");
    

    See the regex demo

    Details

    • (?s) - a DOTALL modifier (. matches line break chars now)
    • (.*?[.?!](?:\s.*?[.?!]){0,2}) - Group 1:
      • .*?[.?!] - any 0+ chars, as few as possible, up to the leftmost ., ? or ! followed with
      • (?:\s.*?[.?!]){0,2} - 0 to 2 sequences of
        • \s - a whitespace
        • .*?[.?!] - any 0+ chars, as few as possible, up to the leftmost ., ? or !
    • \s+ - 1 or more whitespaces

    The $1\n replacement takes the whole match except the last whitespaces, and appends the newline at the end.