Search code examples
javaunicodecodepoint

Get the Unicode block or script name for a particular character (code point) in Java


I see the Character class has two nested enums:

For a particular character, how do I get the appropriate block or script enum object? I expected a pair of methods such as Character.getUnicodeBlock( codePoint ) & Character.getUnicodeScript( codePoint ). But I do not see such methods.


Solution

  • Use enum method of

    Look on each enum class rather than Character class. You will find each enum offers an of method to which you can pass a code point.

    Example:

    int codePoint = 2_309;
    String text = Character.toString( codePoint );
    Character.UnicodeBlock block = Character.UnicodeBlock.of( codePoint );
    Character.UnicodeScript script = Character.UnicodeScript.of( codePoint );
    
    System.out.println( "text = " + text );
    System.out.println( "block = " + block );
    System.out.println( "script = " + script );
    

    text = अ

    block = DEVANAGARI

    script = DEVANAGARI

    Example usage

    We can use the of method to find all the characters of a particular script.

    IntStream
            .rangeClosed( Character.MIN_CODE_POINT , Character.MAX_CODE_POINT )
            .filter( ( int codePoint ) -> Character.UnicodeScript.of( codePoint ) == Character.UnicodeScript.DEVANAGARI )
            .mapToObj( ( int codePoint ) -> codePoint + ":" + Character.toString( codePoint ) )
            .forEach( System.out :: println );
    

    When run:

    2304:ऀ
    2305:ँ
    2306:ं
    2307:ः
    2308:ऄ
    2309:अ
    2310:आ
    …