Search code examples

Displaying UTF-8 Emoji in Java

Say I have the 😈 (devil) emoji.

In 4-byte UTF-8, it's represented like so: \u00f0\u009f\u0098\u0088

However, in Java, it will only print correctly like so: \ud83d\ude08

How would I convert from the first to the second?


MNEMO's answer is much simpler, and answers my question, so it's probably better to go with his solution.


Thanks Basil Bourque for the write-up. It was very interesting.

I found a good reference here: (particularly the convertUTF82Char() function).

For anyone wandering by here in the future, here's what that looks like in Java:

public static String fromCharCode(int n) {
    char c = (char)n;
    return Character.toString(c);

public static String decToChar(int n) {
    // converts a single string representing a decimal number to a character
    // note that no checking is performed to ensure that this is just a hex number, eg. no spaces etc
    // dec: string, the dec codepoint to be converted
    String result = "";
    if (n <= 0xFFFF) {
        result += fromCharCode(n);
    } else if (n <= 0x10FFFF) {
        n -= 0x10000;
        result += fromCharCode(0xD800 | (n >> 10)) + fromCharCode(0xDC00 | (n & 0x3FF));
    } else {
        result += "dec2char error: Code point out of range: " + decToHex(n);

    return result;

public static String decToHex(int n) {
    return Integer.toHexString(n).toUpperCase();

public static String convertUTF8_toChar(String str) {
    // converts to characters a sequence of space-separated hex numbers representing bytes in utf8
    // str: string, the sequence to be converted
    var outputString = "";
    var counter = 0;
    var n = 0;

    // remove leading and trailing spaces
    str = str.replaceAll("/^\\s+/", "");
    str = str.replaceAll("/\\s+$/", "");
    if (str.length() == 0) {
        return "";

    str = str.replaceAll("/\\s+/g", " ");

    var listArray = str.split(" ");
    for (var i = 0; i < listArray.length; i++) {
        int b = parseInt(listArray[i], 16); // alert('b:'+dec2hex(b));
        switch (counter) {
            case 0:
                if (0 <= b && b <= 0x7F) { // 0xxxxxxx
                    outputString += decToChar(b);
                } else if (0xC0 <= b && b <= 0xDF) { // 110xxxxx
                    counter = 1;
                    n = b & 0x1F;
                } else if (0xE0 <= b && b <= 0xEF) { // 1110xxxx
                    counter = 2;
                    n = b & 0xF;
                } else if (0xF0 <= b && b <= 0xF7) { // 11110xxx
                    counter = 3;
                    n = b & 0x7;
                } else {
                    outputString += "convertUTF82Char: error1 " + decToHex(b) + "! ";
            case 1:
                if (b < 0x80 || b > 0xBF) {
                    outputString += "convertUTF82Char: error2 " + decToHex(b) + "! ";
                outputString += decToChar((n << 6) | (b - 0x80));
                n = 0;
            case 2:
            case 3:
                if (b < 0x80 || b > 0xBF) {
                    outputString += "convertUTF82Char: error3 " + decToHex(b) + "! ";
                n = (n << 6) | (b - 0x80);

    return outputString.replaceAll("/ $/", "");

Pretty much a 1-for-1 copy, but it accomplishes my goal.


  • well, this is quite unnecessary to add, but after you understand all character encoding system and Unicode concept, following code might work for you.

    byte[] a = { (byte)0xf0, (byte)0x9f, (byte)0x98, (byte)0x88 };
    String s = new String(a,"UTF-8");
    byte[] b = s.getBytes("UTF-16BE");
    for ( byte c : b ) { System.out.printf("%02x ",c); }