Search code examples
javaregexpunctuation

how to reduce multiple punctuation marks to just one from text file in java


I have some text files that contain multiple punctuation marks, so I need to reduce those to single punctuation marks.

Here is some sample text:

They are working in London..... he is a Java developer !!!!! they are playing------ She is working_______

This is the required output:

They are working in London.he is a Java developer !they are playing- She is working_

I need some help with the Java regex.

Thanks


Solution

  • Use backreference (\1+) to match repeated character.

    Try following:

    String text = "They are working in London..... he is a Java developer !!!!! they are playing------ ---- ---- She is working_______";
    String replaced = text.replaceAll("(?:([-.!_])\\1+\\s*)+", "$1");
    System.out.println(replaced);
    

    prints

    They are working in London.he is a Java developer !they are playing-She is working_