I have some strings like this(encoded as utf-8):
توسعه
.
I want to convert them to:
توسعه
How can I do that in javascript?
The solution needs to be compatible with nashorn, since I am running the code in a virtual engine in java.
NOTE: None of these HTML Entity Decode, Unescape HTML entities in Javascript? are acceptable for my question, since they do not work in nashorn.
P.S: I have searched for possible solutions, and it was suggested by many to use decodeURIComponent(escape(window.atob(yourString)))
(with slight differences), which apparently does not work, as I have tried them in vscode(javascript).
The string I mentioned in the question can be broke down to smaller parts separated by ;
. Each part, is a combination of &#
and a hex number(e.gx62A
) corresponding to a character(ت).
Following code will do the job, by parsing input str
and finding corresponding characters. The result is concatenation of characters.
human_readable = function (str) {
hex_code = str.match(/([^&#]+[\w][^;])|(\s)/g)
s = ''
for (j = 0; j < hex_code.length; j++) {
if (hex_code[j] != ' ') {
int_code = parseInt("0" + hex_code[j])
char = String.fromCharCode(int_code)
} else {
char = ' '
}
s = s + char
}
return s
}
console.log(human_readable('توسعه'))
P.S: I have assumed that if str contains white spaces, it will be simply ' '
, and not the corresponding unicode.