I have a page which includes escaped Unicode characters. (For example the characters 漢字 are escaped as \u6F22\u5B57). This page shows how you can use the unescape() method to convert the escaped \u6F22\u5B57 to 漢字. I have a method that converts all of the Unicode escaped characters, but it is not very fast.
function DecodeAllUnicodeCharacters (strID)
{
var strstr = $(strID).innerHTML;
var arrEncodeChars = strstr.match(/\\u[0-9A-Z]{4,6}/g);
for (var ii = 0; ii < arrEncodeChars.length; ii++) {
var sUnescaped = eval("unescape('"+arrEncodeChars[ii]+"')");
strstr = strstr.replace(arrEncodeChars[ii], sUnescaped);
}
$(strID).innerHTML = strstr;
}
The part that takes longest is setting the innerHTML here: $(strID).innerHTML = strstr; Is there a good way to replace the characters without redoing the innerHTML of the whole page?
The reason it is slow to set innerHTML
is because that causes the browser to parse it as HTML, and if there are child elements they get recreated which is extra slow. Instead we need to find just the text nodes and selectively treat them if they contain escaped content. I base the following on a previous question and demonstrated in a fiddle.
Element.addMethods({
// element is Prototype-extended HTMLElement
// nodeType is a Node.* constant
// callback is a function where first argument is a Node
forEachDescendant: function (element, nodeType, callback)
{
element = $(element);
if (!element) return;
var node = element.firstChild;
while (node != null) {
if (node.nodeType == nodeType) {
callback(node);
}
if(node.hasChildNodes()) {
node = node.firstChild;
}
else {
while(node.nextSibling == null && node.parentNode != element) {
node = node.parentNode;
}
node = node.nextSibling;
}
}
},
decodeUnicode: function (element)
{
var regex = /\\u([0-9A-Z]{4,6})/g;
Element.forEachDescendant(element, Node.TEXT_NODE, function(node) {
// regex.test fails faster than regex.exec for non-matching nodes
if (regex.test(node.data)) {
// only update when necessary
node.data = node.data.replace(regex, function(_, code) {
// code is hexidecimal captured from regex
return String.fromCharCode(parseInt(code, 16));
});
}
});
}
});
The benefit of element.addMethods
, aside from aesthetics, is the functional pattern. You can use decodeUnicode
several ways:
// single element
$('element_id').decodeUnicode();
// or
Element.decodeUnicode('element_id');
// multiple elements
$$('p').each(Element.decodeUnicode);
// or
$$('p').invoke('decodeUnicode');