Search code examples
javaregexdouble-quotes

How to replace all double quotes without a backslash ahead


Assume I have a string like below:

String param = "[\"\\n\",\"\\t\",\"'\",\"\\\"\",\"\\\\\"]"

The output of System.out.println is:

"\n","\t","'","\"","\\"

I would like to replace double quotes which doesn't have a backslash ahead, or, in another word, I would like to have the System.out.println output like below:

\n,\t,',\",\\

So I used this pattern:

System.out.println(param.replaceAll("\\\\{0}\"", ""));

But I got this:

\n,\t,',\,\\

As you can see, the double quote with a backslash ahead is also replaced. How can I prevent it from being replaced?

Edit: Sorry about the square brackets. You may ignore them cause they have nothing to do with this question


Solution

  • You can use the following regex to match and remove " that are string literal qualifiers:

    (?s)(?<!\\)((?:\\{2})*)"([^"\\]*(?:\\.[^"\\]*)*)"
    

    See the regex demo.

    Details

    • (?s) - DOTALL modifier (just in case the string literal can span across lines)
    • (?<!\\) - no \ immediately to the left of the current location
    • ((?:\\{2})*) - Group 1: any 0+ conseuctive occurrences of 2 backslashes
    • " - a double quote (string literal start)
    • ([^"\\]*(?:\\.[^"\\]*)*) - Group 2:
      • [^"\\]* - any 0+ chars other than \ and "
      • (?:\\.[^"\\]*)* - 0+ sequences of
        • \\. - a \ followed with any char
        • [^"\\]* - any 0+ chars other than \ and "
    • " - a closing string literal double quote

    See the Java demo:

    String param = "[\"\\n\",\"\\t\",\"'\",\"\\\"\",\"\\\\\",\"\\\\\\\"\"]";
    System.out.println(param);
    // => ["\n","\t","'","\"","\\","\\\""]
    String regex = "(?s)(?<!\\\\)((?:\\\\{2})*)\"([^\"\\\\]*(?:\\\\.[^\"\\\\]*)*)\"";
    param = param.replaceAll(regex, "$1$2");
    System.out.println(param);
    // => [\n,\t,',\",\\,\\\"]