Look at the following code snippet in Java.
final public class Main
{
public static void main(String[] args)
{
Locale.setDefault(new Locale("lt"));
String str = "\u00cc"; //setting Lithuanian as locale
System.out.println("Before case conversion is "+str+" and length is "+str.length());// Ì
String lowerCaseStr = str.toLowerCase();
System.out.println("Lower case is "+lowerCaseStr+" and length is "+lowerCaseStr.length());// i?`
}
}
It displays the following output.
Before case conversion is Ì and length is 1
Lower case is i̇̀ and length is 3
In the first System.out.println()
statement, the result is exact. In the second statement, however, it displays the length 3 which actually should have been 1. I don't understand, Why?
Different languages have different rules to transform to upper- or lower-case.
For example, in German, the lowercase ß becomes two uppercase S, so the word "straße" (a street), which is 6 characters long, becomes "STRASSE", which is 7 characters long.
This is why your upper-cased and lower-cased strings have different lengths.
I wrote about this in one of my Java Quiz : http://thecodersbreakfast.net/index.php?post/2010/09/24/Java-Quiz-42-%3A-A-string-too-far