Search code examples
javaencodingencodeunicode-literals

Java: how to convert UTF-8 (in literal) to unicode


I've a UTF-8(in literal) like this "\xE2\x80\x93."

I'm trying to convert this into Unicode using Java.

But I was not able to find a way to convert this.

Can anyone help me on this?

Regards, Sat


Solution

  • System.out.println(new String(new byte[] {
        (byte)0xE2, (byte)0x80, (byte)0x93 }, "UTF-8"));
    

    prints an em-dash, which is what those three bytes encode. It is not clear from your question whether you have such three bytes, or literally the string you have posted. If you have the string, then simply parse it into bytes beforehand, for example with the following:

    final String[] bstrs = "\\xE2\\x80\\x93".split("\\\\x");
    final byte[] bytes = new byte[bstrs.length-1];
    for (int i = 1; i < bstrs.length; i++)
      bytes[i] = (byte) ((Integer.parseInt(bstrs[i], 16) << 24) >> 24);
    System.out.println(new String(bytes, "UTF-8"));