Search code examples
androidjsonkotlinemojiunicode-escapes

Facebook JSON data emojis not showing up properly?


After many hours checking questions and answers on the stackoverflow, I couldn't get this to work. Here's the problem, consider the following JSON object from Facebook's downloadable JSON data:

{
    "sender_name": "megalo\u00e5\u00bd\u00a9",
    "timestamp_ms": 1679173611981,
    "content": "Reacted \u00f0\u009f\u00a4\u008d to your message "
}

The problem: In the example JSON above, the sender name contains Japanese characters, and the chat message content contains a white heart, represented by the UTF-8 unicode escape sequence which is \u00f0\u009f\u00a4\u008d. However, when displayed on Android's TextView or Jetpack Compose, it displays as this ð¤ which are clearly two separate characters. Android failed to interpret the whole 4-part sequence as one entire emoji.

What didn't work: Reading the actual JSON with UTF-8 did not do it. Android will fail to understand that there is literally one emoji and not two unicode letters. Here's the parsing logic, a JSON read directly from a json file.

val actualJson = String(jsonInputStream.readBytes(), Charsets.UTF_8)

Why is Android not reading the UTF-8 content correctly ?


Solution

  • The workaround to solving this was kind of hacky. To make sure Android encodes Latin-1 characters first then leave the UTF-8 for last, I had to convert the string to a bytearray while considering it a Latin-1 string not UTF-8, but then decoding it back to UTF-8. I am not exactly sure why this worked but it's the only thing that did and I am glad it did since I was about to drop the whole thing completely after wasting hours looking for answers.

    val finalString = String(initialString.toByteArray(Charsets.ISO_8859_1), Charsets.UTF_8)
    

    This actually did the trick. No other solution worked not even the commons text's StringEscapeUtils.escapeJava/unescapeJava methods.