Search code examples
javaencodepinyin

Encoding pinyin


I'm currently developing a program in java, and I want to display Chinese pinyin, which I get from a distant website.

But I have the following problem: Chinese pinyin is displayed this way: jiǎ
Whereas it should be displayed this way: jiǎ
(I just typed the same sequence, except I stripped the slashes).

I think the answer to this question is really simple but I'm struggling to find it.


Solution

  • The problem is you have an HTML encoded Unicode character and what you want is the decoded version of it. A library like commons-lang3 (part of Apache Commons) will take your HTML encoded string and decode it for Java to display like this:

    String decoded = StringEscapeUtils.unescapeHtml("jiǎ");
    

    You can also escape Unicode characters in Java like this:

    String jia = "ji\u01ce";
    

    This clever one-liner will take a Unicode character and show you its escaped form:

    System.out.println( "\\u" + Integer.toHexString('ǎ' | 0x10000).substring(1) );