Search code examples
javaunicodeutf-8thai

Number of characters in Java String


Possible Duplicate:
Java: length of string when using unicode overline to display square roots?

How do I get number of Unicode characters in a String?

Given a char[] of Thai characters:

[อ, ภ, ิ, ช, า, ต, ิ]

This comes out in String as: อภิชาติ

String.length() returns 7. I understand there are (technically) 7 characters, but I need a method that would return me 5. That is the exact number of character spaces represented on screen.


Solution

  • Seems you just want to not count the unicode marks as separate characters;

    static boolean isMark(char ch)
    {
        int type = Character.getType(ch);
        return type == Character.NON_SPACING_MARK ||
               type == Character.ENCLOSING_MARK ||
               type == Character.COMBINING_SPACING_MARK;
    }
    

    which can be used as;

    String olle = "อภิชาติ";
    int count = 0;
    
    for(int i=0; i<olle.length(); i++)
    {
        if(!isMark(olle.charAt(i)))
            count++;
    }
    
    System.out.println(count);
    

    and returns '5'.