I have this column in my database table:
`data` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
Names like 'João' are inserted. But they're showing up as Jo\u00e3o
. E.g.:
{"4":"jo\u00e3o da silva"}
I tried changing the character set and the collation, but it didn't seem to help. What can I do in order to fix it?
My database "character set" settings:
First of all, \u00e3
is not generated by MySQL. It is, however, optionally generated by PHP's json_encode()
. Be sure to use JSON_UNESCAPED_UNICODE
in the second argument to that function.
Meanwhile, those codes are properly interpreted by web browsers, so you won't notice issues there. And reading and writing from and to a database won't change them. But note that any backslash needs to be escaped when INSERTing
into a database table.
For use in MySQL tables, I prefer to have the connection and server settings consistently set at utf8mb4
so that Unicode stuff simply comes and goes without conversion.
I agree with "never trust your screen". About the only way to see what is actually stored in the database is to use SELECT HEX(col)...
For ã
:
UTF-8 (utf8mb4): Hex: C3A3
latin1: Hex: E3
But, for \u00e3
, the hex would be 5C7530306533
In PHP, there is bin2hex()
.