Search code examples
javascriptajaxunicodexmlhttprequest

XMLHttpRequest alters text in UTF-8


While processing a huge XML client-side, got stuck with the following issue: some unicode characters are replaced with unreadable sequences, so server cannot parse that XML. Testing like this:

var text = new XMLSerializer().serializeToString(xmlNode);
console.log(text);
var req = new XMLHttpRequest();
req.open('POST', config.saveUrl, true);
req.overrideMimeType("application/xml; charset=UTF-8");
req.send(text);

Logging still shows the correct string:

<Language Self="Language/$ID/Czech" Name="$ID/Czech" SingleQuotes="‚‘" DoubleQuotes="„“" PrimaryLanguageName="$ID/Czech" SublanguageName="$ID/" Id="266" HyphenationVendor="Hunspell" SpellingVendor="Hunspell" />

While in the request (Chrome dev tools) and at server side it appears modified like this:

<Language Self="Language/$ID/Czech" Name="$ID/Czech" SingleQuotes="‚‘" DoubleQuotes="„“" PrimaryLanguageName="$ID/Czech" SublanguageName="$ID/" Id="266" HyphenationVendor="Hunspell" SpellingVendor="Hunspell" />

Original encoding of the XML file is UTF-8, too. Absolutely the same behavior when using jQuery.


Solution

    1. Check that overrideMimeType use uppercase "UTF-8" or lowercase "utf-8"
    2. Make sure that string before javascript calculation was in utf-8 (check page charset)
    3. Use escape/encodeURIComponent/decodeURIComponent before send it to server and unescape it on server
    4. Try application/x-www-form-urlencoded ans send xml like plain text

    P.S. Modified string is in ISO-8859-15