Search code examples
mysqljsonutf-8character-encoding

Why is 'João' coming out as 'Jo\u00e3o'?


I have this column in my database table:

`data` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci DEFAULT NULL

Names like 'João' are inserted. But they're showing up as Jo\u00e3o. E.g.:

{"4":"jo\u00e3o da silva"}

I tried changing the character set and the collation, but it didn't seem to help. What can I do in order to fix it?

My database "character set" settings:

MySQL character set settings


Solution

  • First of all, \u00e3 is not generated by MySQL. It is, however, optionally generated by PHP's json_encode(). Be sure to use JSON_UNESCAPED_UNICODE in the second argument to that function.

    Meanwhile, those codes are properly interpreted by web browsers, so you won't notice issues there. And reading and writing from and to a database won't change them. But note that any backslash needs to be escaped when INSERTing into a database table.

    For use in MySQL tables, I prefer to have the connection and server settings consistently set at utf8mb4 so that Unicode stuff simply comes and goes without conversion.

    I agree with "never trust your screen". About the only way to see what is actually stored in the database is to use SELECT HEX(col)... For ã:

    UTF-8 (utf8mb4): Hex: C3A3 
    latin1:          Hex:  E3
    

    But, for \u00e3, the hex would be 5C7530306533

    In PHP, there is bin2hex().