Search code examples
javadecodeencode

How can I decode chinese


I am just about to start decode something that I am sure is chinese and it looks like this in the database: 衡

The Sybase encodes is windows-1252 by default, but what is the above? How can I decode it to get the chinese chars written out. It is stored as nchar unfortunally.

In case anyone wonder This is how its done:

int i = new Integer("34913").intValue();

String s = new String(Character.toChars(i));

As @Joachim said, thank you.


Solution

  • This is a decimal numeric character reference as defined by XML (as well as HTML4 and HTML5). The number is the decimal representation of the Unicode codepoint.

    Simply parse the number and cast it to an int to get the actual unicode codepoint. Then use Character.toChars() to get the corresponding char values (usually just one, but for characters outside the BMP there will be two surrogate values).