Search code examples
javaregexpunctuation

Java - Remove all non word characters of a string for all languages


I need to remove all punctuation from words in java i tried this

    System.out.println("do.,it".replaceAll("[^\\w]", ""));
    System.out.println("сказочники".replaceAll("[^\\w]", ""));

But it won't work with kyrillic or other languages. I already tried to work with

\p{Punct}

But the list is not complete, for example

„ and »

Are missing


Solution

  • Not sure if java supports this, but have a try with:

    "сказочники".replaceAll("\P{wd}+", "")
    

    where \P{wd} stands for any non-word character in any language. It is the opposite of \p{wd}