Search code examples
javaregexformattingreplaceall

Java regex to remove specific punctuation


I'm formatting a very large amount of plaintext files using java, and I need to remove all punctuation except for apostrophes. When I originally had set up the regex for the replaceAll statement, it worked to get rid of everything that I knew of, except now I've found one particular file/punctuation set that it's not working in.

    holdMe = holdMe.replaceAll("[,_\"-.!?:;)(}{]", " ");

I know I'm hitting this statement because all of the other punctuation clears, there's no periods, commas, etcetera. I've tried escaping out the () and {} characters, but it still doesn't get replaced on those characters. I've been trying to teach myself regex using the Oracle documentation, but I can't seem to understand why this isn't working.


Solution

  • This regex will mark every punctuation except Apostrophes

    [\p{P}&&[^\u0027]]
    

    The java-string of the regex:

    "[\\p{P}&&[^\u0027]]"