As part of transitioning my Thunderbird extension to Thunderbird 60, I need to switch from using nsIScriptableUnicodeConverter (If you don't know Mozilla, never mind what that is) to the more popular, and multiple-browser-supported, TextDecoder and TextEncoder. The thing is, their behavior is not what I would expect.
Specifically, suppose I have the string str
containing "ùìåí," (without the quotes of course). Now, when I run:
undecoded_str = new TextEncoder("windows-1252").encode(str);
I expect to be getting the sequence
F9, EC, E5, ED, 2C
(the 1-octet windows-1252 value for each of the 5 characters). But what I actually get is:
C3, B9, C3, AC, C3, A5, C3, AD, 2C
which seems to be the UTF-8 encoding of the string. Why is this happening?
Annoyingly, many browser have simply dropped support for multiple character set encodings in TextEncoder
(and TextDecoder
):
Note: Firefox, Chrome and Opera used to have support for encoding types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and gbk). As of Firefox 48 (ticket), Chrome 54 (ticket) and Opera 41, no other encoding types are available other than utf-8, in order to match the spec. In all cases, passing in an encoding type to the constructor will be ignored and a utf-8
TextEncoder
will be created (theTextDecoder
still allows for other decoding types).
Damn it!