Search code examples
javaregexreplaceall

Java ReplaceAll Regular Expression With Exclusions


I am trying to replace all instances of sentence terminators such as '.', '?', and '!', but I do not want to replace strings like "dr." and "mr.".

I have tried the following:

text = text.replaceAll("(?![mr|mrs|ms|dr])(\\s*[\\.\\?\\!]\\s*)", "\n");

...but that does not seem to work. Any suggestions would be appreciated.


Edit: After the feedback here and a bit of tweeking this is the working solution to my problem.

private String convertText(String text) {
  text = text.replaceAll("\\s+", " ");
  text = text.replaceAll("[\n\r\\(\\)\"\\,\\:]", "");
  text = text.replaceAll("(?i)(?<!dr|mr|mrs|ms|jr|sr|\\s\\w)(\\s*[\\.\\?\\!\\;](?:\\s+|$))","\r\n");
  return text.trim();
}

The code will extract all* compound and single sentences from an excerpt of text, removing all punctuation and extraneous white-space.
*There are some exceptions...


Solution

  • You need to use negative lookbehind instead of negative lookahead like this

    String x = "dr. house.";
    System.out.println(x.replaceAll("(?<!mr|mrs|ms|dr)(\\s*[\\.\\?\\!]\\s*)","\n"));
    

    Also the list of mr/dr/ms/mrs should not be inside character classes.