Search code examples
androidunicodetextview

Use Unicode in Android


While working with an Android application where I have observed that the Unicode are the part of API response as:

{
"msg_txt":"Laurent Ruquier et l'\\u00e9quipe"
}

On the Android side, I have a simple TextView where I have to show the actual text by converting a Unicode into text.


Solution

  • Since you have the unicode characters in the response already you will need to parse it. I am not a friend of reinventing the wheel so you would usually use something like the apache commons unescapeJava. If you decide to add it to your gradle build as a dependency make sure to correctly configure your R8 shrinking (see here), otherwise you will add a gigantic amount of methods and classes to your release builds. This can also slow down debug builds considerably so beware.

    While the method above does actually even more than just replace escaped unicode chars, if you just need this feature and don't want to add the apache dependency for just this code we can take a look at their implementation here and here. I have extracted the relevant part and converted it to Kotlin just for fun:

    val inputString = "Laurent Ruquier et l'\\u00e9quipe"
    val unescaped = translate(inputString)
    
    
    //this is the main method that does the actual conversion on a character
    // by character basis.
    fun translate(input: CharSequence, index: Int, out: Writer): Int {
        if (input[index] == '\\' && index + 1 < input.length && input[index + 1] == 'u') {
            // consume optional additional 'u' chars
            var i = 2
            while (index + i < input.length && input[index + i] == 'u') {
                i++
            }
            if (index + i < input.length && input[index + i] == '+') {
                i++
            }
            if (index + i + 4 <= input.length) {
                // Get 4 hex digits
                val unicode = input.subSequence(index + i, index + i + 4)
                try {
                    val value = unicode.toString().toInt(16)
                    out.write(value)
                } catch (nfe: NumberFormatException) {
                    throw IllegalArgumentException("Unable to parse unicode value: $unicode", nfe)
                }
                return i + 4
            }
            throw IllegalArgumentException(
                "Less than 4 hex digits in unicode value: '"
                        + input.subSequence(index, input.length)
                        + "' due to end of CharSequence"
            )
        }
        return 0
    }
    
    //helper method for working directly with strings
    fun translate(input: CharSequence): String {
        val writer = StringWriter(input.length * 2)
        translate(input, writer)
        return writer.toString()
    }
    
    // this goes through the actual char sequence and passes every
    // single char to the unicode transformer and swallows consumed chars 
    fun translate(input: CharSequence, out: Writer) {
        var pos = 0
        val len = input.length
        while (pos < len) {
            val consumed = translate(input, pos, out)
            if (consumed == 0) {
                // inlined implementation of Character.toChars(Character.codePointAt(input, pos))
                // avoids allocating temp char arrays and duplicate checks
                val c1 = input[pos]
                out.write(c1.toInt())
                pos++
                if (Character.isHighSurrogate(c1) && pos < len) {
                    val c2 = input[pos]
                    if (Character.isLowSurrogate(c2)) {
                        out.write(c2.toInt())
                        pos++
                    }
                }
                continue
            }
            // contract with translators is that they have to understand codepoints
            // and they just took care of a surrogate pair
            for (pt in 0 until consumed) {
                pos += Character.charCount(Character.codePointAt(input, pos))
            }
        }
    }
    
    

    Note this is really bare bones and might need to be adjusted a bit for actual production use. If e.g. your input will have both of the backslashes in there (like val inputString = "Laurent Ruquier et l'\\\\u00e9quipe") you will need to modify the method a bit.