Search code examples
jsongounicodeemojiutf8mb4

Go is generating unescaped control characters in JSON output due to emoji


I'm having trouble with something in Go and I'm not sure where to look. I'm fetching a UTF-8 string from a MySQL database, and attempting to return it in a JSON response to a client.

Different clients react differently, but iOS NSJSONSerialization returns an "Unescaped control character" error. This breaks the whole application. I can decode the JSON without issue in Chrome using JSON.parse(), though.

On the server-side, this same generator function written in another language besides Go works fine. Help?


EDIT: Here is the JSON that is causing the issue:

{ "test":"☮️" }

... If I omit this emoji, it works. If it's there, it doesn't work. The issue seems to be something related to there being two different encodings for certain emoji. One seems to trip up Go, but they are both valid.

To demonstrate the difference in encoding, some of the emoji show up in the database explorer and some do not:

screenshot

... These ones that appear in the database explorer are causing this issue with 100% reproducibility. However, all of them usually appear in the actual client software (not the database explorer) without issue. I don't know if there's a way to reconfigure the database connection to avoid this (or something), but it seems to work with different instances depending on what is doing the decoding and how forgiving it is. Considering that users could type or copy/paste either encoding... this needs to work consistently.

Any help would be appreciated. Thanks in advance.


Solution

  • Go is doing fine.

    fmt.Println([]byte("☮️"))
    //[226 152 174 239 184 143]
    //Yup, 1 character - 6 bytes.
    

    NSJSONSerialization cant handle this. May be this link will be helpful NSJSONSerialization and Emoji. It's something about NSData * utf32Data = [uniText dataUsingEncoding:NSUTF32LittleEndianStringEncoding];. blah

    Can you give us byte representation of "☮️" simbol in "iOS style", like i did with go?

    UPD

    I made some research, looks like something wrong with your database encoding. Is it UTF16?

    Check this out

    // it look the same, but completely different "characters"
    //first one is yours, and second one is U+262E
    const nihongo = "☮️☮"
    for index, runeValue := range nihongo {
            fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
    }
    bad := []byte("☮️")
    good := []byte("☮")
    fmt.Printf("%v %s \n", bad, bad)
    fmt.Printf("%v %s \n", good, good)
    

    Output:

    U+262E '☮' starts at byte position 0
    U+FE0F '️' starts at byte position 3
    U+262E '☮' starts at byte position 6
    [226 152 174 239 184 143] ☮️ 
    [226 152 174] ☮ 
    

    UDP2

    It just hit me! I was doing ctrl+c/ctrl+v all the way with your symbol. But it is not a single symbol! Its 2 symbols and second one is unprintable.

    unprintable := []byte{239, 184, 143}
    fmt.Printf("valid? %v", utf8.Valid(unprintable))
    fmt.Println("full rune?", utf8.FullRune(unprintable))
    r, size := utf8.DecodeRune(unprintable)
    fmt.Println(r, size, string(r))
    fmt.Printf("valid rune? #v", utf8.ValidRune(r))
    

    Output:

    valid? true
    full rune? true
    65039 3 ️
    valid rune? true
    

    So, your db is fine, unprintable "character" is fine, but NSJSONSerialization can not handle it. Better to ask iOS community =)