Search code examples
javac#regexpunctuation

Removing all punctuation using POSIX in Java and C# produce different output


Here is my try:

Java:

public static void main(String[] args) {
 String text = "This && is **^^ a ~~@@ test.";
 System.out.println(Pattern.compile("\\p{Punct}").matcher(text).replaceAll(""));
 // OUT: This  is  a  test --> As I expected
}

C#:

static void Main(string[] args) {
 string text = "This && is **^^ a ~~@@ test.";
 Console.WriteLine(Regex.Replace(text, "\\p{P}", ""));
 // OUT: This  is ^^ a ~~ test
 // expected: This  is  a  test
 Console.ReadLine();
}

Any ideas? Thank you!


Solution

  • "\\p{P}" means that same in both Java and C#, i.e. match Unicode Category P (Punctuation).

    Java's "\\p{Punct}" means something else, and is documented as:

    Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

    So, the equivalent C# is "[!\"#$%&'()*+,\\-./:;<=>?@\\[\\\\\\]^_`{|}~]"