Search code examples
javahtmlstringeclipsedecode

How can I unescape HTML character entities in Java?


Basically, I would like to decode a given HTML document, and replace all special characters, such as " "" " and ">"">".

In .NET, we can make use of the HttpUtility.HtmlDecode method.

What's the equivalent function in Java?


Solution

  • I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:

    Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.