Search code examples
javascriptcase-sensitiveuppercase

"İ".toLowerCase() != "i"


In Turkish, there's a letter İ which is the uppercase form of i. When I convert it to lowercase, I get a weird result. For example:

var string_tr = "İ".toLowerCase();
var string_en = "i";

console.log( string_tr == string_en );  // false
console.log( string_tr.split("") );     // ["i", "̇"]
console.log( string_tr.charCodeAt(1) ); // 775
console.log( string_en.charCodeAt(0) ); // 105

"İ".toLowerCase() returns an extra character, and if I'm not mistaken, it's COMBINING DOT ABOVE (U+0307).

How do I get rid of this character?

I could just filter the string:

var string_tr = "İ".toLowerCase();

string_tr = string_tr.split("").filter(function (item) {
    if (item.charCodeAt(0) != 775) {
        return true;
    }
}).join("");

console.log(string_tr.split(""));

but am I handing this correctly? Is there a more preferable way? Furthermore, why does this extra character appear in the first place?

There's some inconsistency. For example, in Turkish, there a lowercase form of I: ı. How come the following comparison returns true

console.log( "ı".toUpperCase() == "i".toUpperCase() ) // true

while

console.log( "İ".toLowerCase() == "i" ) // false

returns false?


Solution

  • You’ll need a Turkish-specific case conversion, available with String#toLocaleLowerCase:

    let s = "İ";
    
    console.log(s.toLowerCase().length);
    console.log(s.toLocaleLowerCase('tr-TR').length);