I have the below test case and only the first assertion passes. Why?
@Test
public void test() {
String i1 = "i";
String i2 = "İ".toLowerCase();
System.out.println((int)i1.charAt(0)); // 105
System.out.println((int)i2.charAt(0)); // 105
assertTrue(i2.startsWith(i1));
assertTrue(i2.endsWith(i1));
assertTrue(i1.endsWith(i2));
assertTrue(i1.startsWith(i2));
}
What I am trying to is using startsWith
and endsWith
in a case insensitive way such that, below expression should return true.
"ALİ".toLowerCase().endsWith("i");
This happens because lowercase İ
("latin capital letter
i with dot above") in English locales turn into the two characters: "latin small letter i
" and "combining dot above".
This explains why it starts with i
, but doesnt end with i
(it ends with a combining diacritic mark instead).
In a Turkish locale, lowercase İ
simply becomes "latin small letter i
" in accordance with Turkish linguistics rules, and your code would therefore work.
Here's a test program to help figure out what's going on:
class Test {
public static void main(String[] args) {
char[] foo = args[0].toLowerCase().toCharArray();
System.out.print("Lowercase " + args[0] + " has " + foo.length + " chars: ");
for(int i=0; i<foo.length; i++) System.out.print("0x" + Integer.toString((int)foo[i], 16) + " ");
System.out.println();
}
}
Here's what we get when we run it on a system configured for English:
$ LC_ALL=en_US.utf8 java Test "İ"
Lowercase İ has 2 chars: 0x69 0x307
Here's what we get when we run it on a system configured for Turkish:
$ LC_ALL=tr_TR.utf8 java Test "İ"
Lowercase İ has 1 chars: 0x69
This is even the specific example used by the API docs for String.toLowerCase(Locale), which is the method you can use to get the lowercase version in a specific locale, rather than the system default locale.