Search code examples
javascriptutf-8character-encodingnon-ascii-characterswindows-1252

Characters with ASCII > 128 are not correctly read in Javascript


I have a HTML that includes a Javascript file. This script contains a special character, ASCII 152. When I try to display the charCodeAt, I get different results, but never the right one. Could you please advice? Thanks

TEST.HTML

<script type="text/javascript" charset=SEE BELOW src="test.js">
</script>

TEST.JS file with ANSI encoding

function d(a)
{
a=(a+"").split("");
alert(a[1].charCodeAt(0));
};
d("i˜g"); // Note that ˜ is 152 in ASCII
  • TEST.HTML with x-user-defined charset: alert shows 63384. With %63232 works, as every char >128 is displayed as 63232+char.
  • TEST.HTML with utf-8 charset: alert shows 65533. All chars > 128 are displayed as 65533.
  • TEST.HTML with Windows-1252 charset: alert shows 752. I cannot find a relation between ASCII and what is displayed.

TEST.JS file with UTF-8 encoding

function d(a)
{
a=(a+"").split("");
alert(a[1].charCodeAt(0));
};
d("i[x98]g"); // Note that x98 is 152
  • TEST.HTML with x-user-defined charset: alert shows 65533. All chars > 128 are displayed as 65533.
  • TEST.HTML with utf-8 charset: alert shows 65533. All chars > 128 are displayed as 65533.
  • TEST.HTML with Windows-1252 charset: alert shows 65533. All chars > 128 are displayed as 65533.

Solution

  • There is no characters in range 128-255 for utf8, and ASCII ends completely at 127... Also the character at position 1 in "i[x98]g" is a "[", the "[x98]" is meaningless.

    Your function can be replaced with str.charCodeAt(1).

    The character ˜ is Unicode Character 'SMALL TILDE' (U+02DC and can be written as "\u02DC", or String.fromCharCode(732)