Search code examples
javascriptunicode

How to count unaccented English letters in Javascript?


I'm trying to count the number of characters that are unaccented English letters in a string. For example, I would want the count to be 1 for the string "né!".

I thought I would be able to check if each character is in the range 'a'-'z' or 'A'-'Z', but that would include 'é':

'é' >= 'a' && 'e' <= 'z';
true

Both accented and unaccented letters seem to have the same code point:

"eé".codePointAt(0);
101
"eé".codePointAt(1);
101

I tried using regular expressions, but the string "né!" was treated like the 4-character string "ne'!":

    for (let i = 0; i < len; i++) {
        var c = str.charAt(i);
        if (re.test(c)) {
            console.log("Is a letter: " + c);
            numLetters++;
        } else {
           console.log("Is not a letter: " + c);
        }
    }

Output:

Is a letter: n
Is a letter: e
Is not a letter: ́
Is not a letter: !

How can I find the number of characters that are unaccented English letters?


Solution

  • You can use String#normalize to get the composed form of a string.

    let str = "né!";
    let letters = str.normalize().match(/[a-z]/ig);
    console.log(letters?.length ?? 0);
    console.log(letters);