Search code examples
javaregexbackslashcharacter-class

java regex double-backslash in character class


I need to do the following regex in Java:

Split a string at each comma that's not preceded by a backslash (ie. escaped) and is followed by zero or more whitespaces.

I've been trying this:

String str = "Name=Doe\, Jane, Hobby=Skiing, Height=1.70";
String[] parts = str.split("[^\\],\s*");

which is the correct syntax in Perl and works there. Not so in Java.

The above already throws an exception during compilation:

error: illegal escape character
    String[] parts = str.split("[^\\],\s*");

Adding a third and fourth backslash in the character class doesn't help

str.split("[^\\\\],\s*");

Adding a second backslash to the whitespace allows it to compile,

String[] parts = str.split("[^\\],\\s*");

but then a runtime regex.PatternSyntaxException occurs, stating an unclosed character class:

java.util.regex.PatternSyntaxException: Unclosed character class
near index 7
[^\],\s*
       ^

Clearly there's a backslash missing, and I can't get it in ... Can anybody tell me how this should be done in Java?

thx!


Solution

  • This regex does what you want. You forgot to add two additional \\:

    String[] parts = str.split("[^\\\\],\\s*");
    

    Like explained in this question: (java regex pattern unclosed character class)