Search code examples
javaguava

Does Guava provide a method to unescape a String?


I need to escape special characters in a String.

Guava provides the Escaper class, which does exactly this:

Escaper escaper = Escapers.builder()
        .addEscape('[', "\\[")
        .addEscape(']', "\\]")
        .build();

String escapedStr = escaper.escape("This is a [test]");

System.out.println(escapedStr);
// -> prints "This is a \[test\]"

Now that I have an escaped String, I need to unescape it and I can't find anything in Guava to do this.

I was expecting Escaper to have a unescape() method, but it isn't the case.

Edit : I'm aware that unescaping can be tricky, even impossible in some non-sense cases.

For example, this Escaper usage can lead to ambiguities :

Escaper escaper = Escapers.builder()
        .addEscape('@', " at ")
        .addEscape('.', " dot ")
        .build();

Unless the escaped data contains only email addresses and nothing more, you can't safely get your data back by unescaping it.

A good example of a safe usage of the Escaper is HTML entities :

Escaper escaper = Escapers.builder()
        .addEscape('&', "&")
        .addEscape('<', "&lt;")
        .addEscape('>', "&gt;")
        .build();

Here, you can safely escape any text, incorporate it in a HTML page and unescape it at any time to display it, because you covered every possible ambiguities.

In conclusion, I don't see why unescaping is so controversial. I think it is the developper's responsability to use this class properly, knowing his data and avoiding ambiguities. Escaping, by definition, means you will eventually need to unescape. Otherwise, it's obfuscation or some other concept.


Solution

  • No, it does not. And apparently, this is intentional. Quoting from this discussion where Chris Povirk answered:

    The use case for unescaping is less clear to me. It's generally not possible to even identify the escaped source text without a parser that understands the language. For example, if I have the following input:

    String s = "foo\n\"bar\"\n\\";
    

    Then my parser has to already understand \n, \", and \\ in order to identify that...

    foo\n\"bar\"\n\\
    

    ...is the text to be "unescaped." In other words, it has to do the unescaping already. The situation is similar with HTML and other formats: We don't need an unescaper so much as we need a parser.

    So it looks like you'll have to do it yourself.