How do I replace all UTF-8 letters like Ė È É Ê Ë Ą Č and so on to similar latin letters. For example the output of the string ĖÈÉÊËĄČ
would be EEEEEAC
using Javascript or Jquery?
technique described in this question
How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript? returns the result as UTF-8 byte sequence so result of encodeURIComponent("å")
would be equal to %C3%A5
and in my case it should be a
The question may be a duplicate to this one Remove accents/diacritics in a string in JavaScript but all the solutions provided in that question is by hard coding all possible characters and to map that to the one you want to replace with and that's not very clean solution.
If you are allowed to use the function String.normalize() (which is part of the ES6 standard and only works in modern browsers), then you can use this function:
function removeDiacritics(input)
{
var output = "";
var normalized = input.normalize("NFD");
var i=0;
var j=0;
while (i<input.length)
{
output += normalized[j];
j += (input[i] == normalized[j]) ? 1 : 2;
i++;
}
return output;
}
What does this function do? Well first, it normalizes the input string to NFD
:
Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.
This means that composites (characters with a diacritic) are decomposed into two characters. For example, the character é
is decomposed into e
and the combining character ´
.
The next step is the loop that recognizes the decomposed characters and skips the combining accent characters.