Use Unicode in Android

While working with an Android application where I have observed that the Unicode are the part of API response as:

{
"msg_txt":"Laurent Ruquier et l'\\u00e9quipe"
}

On the Android side, I have a simple TextView where I have to show the actual text by converting a Unicode into text.

Solution

Since you have the unicode characters in the response already you will need to parse it. I am not a friend of reinventing the wheel so you would usually use something like the apache commons unescapeJava. If you decide to add it to your gradle build as a dependency make sure to correctly configure your R8 shrinking (see here), otherwise you will add a gigantic amount of methods and classes to your release builds. This can also slow down debug builds considerably so beware.

While the method above does actually even more than just replace escaped unicode chars, if you just need this feature and don't want to add the apache dependency for just this code we can take a look at their implementation here and here. I have extracted the relevant part and converted it to Kotlin just for fun:

val inputString = "Laurent Ruquier et l'\\u00e9quipe"
val unescaped = translate(inputString)


//this is the main method that does the actual conversion on a character
// by character basis.
fun translate(input: CharSequence, index: Int, out: Writer): Int {
    if (input[index] == '\\' && index + 1 < input.length && input[index + 1] == 'u') {
        // consume optional additional 'u' chars
        var i = 2
        while (index + i < input.length && input[index + i] == 'u') {
            i++
        }
        if (index + i < input.length && input[index + i] == '+') {
            i++
        }
        if (index + i + 4 <= input.length) {
            // Get 4 hex digits
            val unicode = input.subSequence(index + i, index + i + 4)
            try {
                val value = unicode.toString().toInt(16)
                out.write(value)
            } catch (nfe: NumberFormatException) {
                throw IllegalArgumentException("Unable to parse unicode value: $unicode", nfe)
            }
            return i + 4
        }
        throw IllegalArgumentException(
            "Less than 4 hex digits in unicode value: '"
                    + input.subSequence(index, input.length)
                    + "' due to end of CharSequence"
        )
    }
    return 0
}

//helper method for working directly with strings
fun translate(input: CharSequence): String {
    val writer = StringWriter(input.length * 2)
    translate(input, writer)
    return writer.toString()
}

// this goes through the actual char sequence and passes every
// single char to the unicode transformer and swallows consumed chars 
fun translate(input: CharSequence, out: Writer) {
    var pos = 0
    val len = input.length
    while (pos < len) {
        val consumed = translate(input, pos, out)
        if (consumed == 0) {
            // inlined implementation of Character.toChars(Character.codePointAt(input, pos))
            // avoids allocating temp char arrays and duplicate checks
            val c1 = input[pos]
            out.write(c1.toInt())
            pos++
            if (Character.isHighSurrogate(c1) && pos < len) {
                val c2 = input[pos]
                if (Character.isLowSurrogate(c2)) {
                    out.write(c2.toInt())
                    pos++
                }
            }
            continue
        }
        // contract with translators is that they have to understand codepoints
        // and they just took care of a surrogate pair
        for (pt in 0 until consumed) {
            pos += Character.charCount(Character.codePointAt(input, pos))
        }
    }
}

Note this is really bare bones and might need to be adjusted a bit for actual production use. If e.g. your input will have both of the backslashes in there (like val inputString = "Laurent Ruquier et l'\\\\u00e9quipe") you will need to modify the method a bit.