Search code examples
htmljspunicodeutf-8euro

Special characters representation issue in JSP


In JSP file, the source code is

|&#x0031;&#x0080;&#x0033;|<%="\u0031\u0080\u0033" %>|

The result on the page is:

|1€3|13|

Why is the Euro symbol represented differently ?


Solution

  • The HTML numerical character references in the range 0x80–0x9F don't actually correspond to the characters U+0080–U+009F. Instead, they refer to the characters mapped into the bytes 0x80–0x9F from the windows-1252 encoding.

    This is a weird historical artefact from the days before browsers did Unicode. HTML5 sort-of standardises it, in that although it's invalid parsers are required to parse it this way. This does not happen in XML/XHTML.

    So \u0080 gives you the actual character U+0080, which you can't see because it's an invisible control character, but &#x0080; gives you code page 1252 byte 0x80, which is U+20AC Euro Sign.