Search code examples
javamysqljdbcpayarautf8mb4

Black diamonds and question marks persisting after setting database uft8mb4


Regarding MySQL database and Java JDBC connection encoding. Database has been converted to utf8mb4 and utf8mb4_unicode_ci as shown this is the results of SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; from the JDBC connection.

+--------------------------+--------------------+
|      Variable_name       |       Value        |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

From the MySQL Workbench and the terminal connecting directly to the DB I can see the Unicode character í with its correct hex value of c3 ad

+------------------------------+
| HEX(location.name)           |
+------------------------------+
| C3AD                         |
+------------------------------+

JDBC Connection settings: useUnicode=true&characterEncoding=UTF-8

Using HikariCP with the config of

config.addDataSourceProperty("useUnicode", "true"); config.addDataSourceProperty("characterEncoding", "utf-8"); config.setConnectionInitSql("SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci");

Using mysql-connector-java:8.0.11

From the results of querying the table in question using the JDBC connection the í character is returned as within Postman. And the 💩 is returned as ? within Postman.

Which leads me to believe that my connection during reading is not UTF-8 according to stackoverflow.com/questions/38363566 how would I detect this?

Database and application have been reset to apply settings, if it was needed.


Solution

  • characterEncoding=utf-8 is incompatible with utf8mb4. Use character_set_server=utf8mb4 in your JDBC URL instead, or config.addDataSourceProperty("character_set_server", "utf8mb4");. Do not use characterEncoding at all.

    From the MySQL Connection/J developer guide → Using Character Sets → Setting the Character Encoding:

    … to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string.

    And right below that:

    Warning

    In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.