Search code examples
groovyutf-16utf

Get Unicode of a character in Groovy


Is there any way in Groovy so that I can get Unicode equivalent of any character? e.g.

Suppose a method getUnicode(char c). A call getUnicode('÷') should return \u00f7.


Solution

  • So I'm gonna hit you with this:

    println( "\\u${'÷'.getBytes("UTF-16")[2..3].collect { String.format('%02x',it) }.join('')}" )
    
    println("\\u${'÷'.getBytes("UTF-16BE").collect { String.format('%02x',it) }.join('')}")
    

    The trick is to get the UTF-16 version of the bytes otherwise it won't work as you expect. However, getBytes returns 4 bytes instead of 2 because it includes the UTF BOM on the front of it. So splicing out indexes 2 to 3 will isolate just the characters you need. The format for hex with String.format, and join 'em back to a string tack on \u characters with a Groovy GString and bam you cooking.