Search code examples
javalocalesetlocale

toLowerCase() method in Java when used with Locale does not produce the exact result


Look at the following code snippet in Java.

final public class Main
{
    public static void main(String[] args) 
    {
        Locale.setDefault(new Locale("lt")); 
        String str = "\u00cc";   //setting Lithuanian as locale

        System.out.println("Before case conversion is "+str+" and length is "+str.length());// Ì
        String lowerCaseStr = str.toLowerCase();
        System.out.println("Lower case is "+lowerCaseStr+" and length is "+lowerCaseStr.length());// i?`
    }
}

It displays the following output.

Before case conversion is Ì and length is 1

Lower case is i̇̀ and length is 3


In the first System.out.println() statement, the result is exact. In the second statement, however, it displays the length 3 which actually should have been 1. I don't understand, Why?


Solution

  • Different languages have different rules to transform to upper- or lower-case.

    For example, in German, the lowercase ß becomes two uppercase S, so the word "straße" (a street), which is 6 characters long, becomes "STRASSE", which is 7 characters long.

    This is why your upper-cased and lower-cased strings have different lengths.

    I wrote about this in one of my Java Quiz : http://thecodersbreakfast.net/index.php?post/2010/09/24/Java-Quiz-42-%3A-A-string-too-far