Search code examples
javastringlowercase

Converting String which contains Turkish characters to lowercase


I want to convert a string which contains Turkish characters to lowercase with Turkish characters mapped into English equivalents i.e. "İĞŞÇ" -> "igsc".

When I use toLowerCase(new Locale("en", "US")) function it converts for example İ to i but with dotted.

How can I solve this problem? (I'm using Java 7)

Thank you.


Solution

  • You may

    1) First, remove the accents :

    the following comes from this topic :

    Is there a way to get rid of accents and convert a whole string to regular letters? :

    Use java.text.Normalizer to handle this for you.

    string = Normalizer.normalize(string, Normalizer.Form.NFD);
    

    This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.

    string = string.replaceAll("[^\\p{ASCII}]", "");
    

    If your text is in unicode, you should use this instead:

    string = string.replaceAll("\\p{M}", "");
    

    For unicode, \P{M} matches the base glyph and \p{M} (lowercase) matches each accent.

    2) Then, just put the remaining String to lower case

    string = string.toLowerCase();