Search code examples
mysqlmysql-5.6mysql-cli

Why does --default-character-set=utf8mb4 have no impact on the client connection?


According to https://dev.mysql.com/doc/refman/5.6/en/charset-connection.html, When I connect to a mysql 5.6 server with the mysql 8.0 client using the command:

/usr/bin/mysql -h ${DB_HOST} -u ${DB_USER} -p --default-character-set=utf8mb4

I expect that the client will set up a utf8mb4 connection to the server. However, the connection is set to latin1:

mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN (
 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) 
ORDER BY VARIABLE_NAME;
+--------------------------+-------------------+
| VARIABLE_NAME            | VARIABLE_VALUE    |
+--------------------------+-------------------+
| CHARACTER_SET_CLIENT     | latin1            |
| CHARACTER_SET_CONNECTION | latin1            |
| CHARACTER_SET_RESULTS    | latin1            |
| COLLATION_CONNECTION     | latin1_swedish_ci |
+--------------------------+-------------------+

Using another character set, like: /usr/bin/mysql -h ${DB_HOST} -u ${DB_USER} -p --default-character-set=koi8r will result in the client connecting with the supplied character set:

mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN ( 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+------------------+
| VARIABLE_NAME            | VARIABLE_VALUE   |
+--------------------------+------------------+
| CHARACTER_SET_CLIENT     | koi8r            |
| CHARACTER_SET_CONNECTION | koi8r            |
| CHARACTER_SET_RESULTS    | koi8r            |
| COLLATION_CONNECTION     | koi8r_general_ci |
+--------------------------+------------------+

The only way I can get the client connection to change is by executing charset utf8mb4 or SET NAMES utf8mb4 AFTER connecting to the server.

mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN ( 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+--------------------+
| VARIABLE_NAME            | VARIABLE_VALUE     |
+--------------------------+--------------------+
| CHARACTER_SET_CLIENT     | utf8mb4            |
| CHARACTER_SET_CONNECTION | utf8mb4            |
| CHARACTER_SET_RESULTS    | utf8mb4            |
| COLLATION_CONNECTION     | utf8mb4_general_ci |
+--------------------------+--------------------+

Why will --default-character-set=utf8mb4 not work? I want to use other client tools like mysqldump and mysqlimport but without this flag I will get latin1 encoding instead of utf8mb4 encoding. Changing default server settings is not an option in this situation, it must be done from the client.

Some more information: I'm trying this from a 20.04 ubuntu WSL2 install, so there is no 5.6 or 5.7 client available. However, using a 5.6 or 5.7 windows mysql client will respect --default-character-set=utf8mb4, but the 8.0 windows client has the same behavior as the WSL2 client.


Solution

  • This behavior is explained in the 8.0 documentation: https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html

    The same problem occurs in a more subtle context: When the client tells the server to use a character set that the server recognizes, but the default collation for that character set on the client side is not known on the server side. This occurs, for example, when a MySQL 8.0 client wants to connect to a MySQL 5.7 server using utf8mb4 as the client character set. A client that specifies --default-character-set=utf8mb4 is able to connect to the server. However, as in the previous example, the server falls back to its default character set and collation, not what the client requested:

       mysql> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
       +--------------------------+--------+ 
       | Variable_name            | Value  |
       +--------------------------+--------+
       | character_set_client     | latin1 | 
       | character_set_connection | latin1 | 
       ...
       |character_set_results    | latin1 | 
       ...
       +--------------------------+--------+ 
       mysql> SHOW SESSION VARIABLES LIKE 'collation_connection';
       +----------------------+-------------------+ 
       | Variable_name        | Value             |
       +----------------------+-------------------+ 
       | collation_connection | latin1_swedish_ci |
       +----------------------+-------------------+ 
    

    Why does this occur? After all, utf8mb4 is known to the 8.0 client and the 5.7 server, so both of them recognize it. To understand this behavior, it is necessary to understand that when the client tells the server which character set it wants to use, it really tells the server the default collation for that character set. Therefore, the aforementioned behavior occurs due to a combination of factors:

    • The default collation for utf8mb4 differs between MySQL 5.7 and 8.0 (utf8mb4_general_ci for 5.7, utf8mb4_0900_ai_ci for 8.0).

    • When the 8.0 client requests a character set of utf8mb4, what it sends to the server is the default 8.0 utf8mb4 collation; that is, the utf8mb4_0900_ai_ci.

    • utf8mb4_0900_ai_ci is implemented only as of MySQL 8.0, so the 5.7 server does not recognize it.

    • Because the 5.7 server does not recognize utf8mb4_0900_ai_ci, it cannot satisfy the client character set request, and falls back to its default character set and collation (latin1 and latin1_swedish_ci).

    In this case, the client can still use utf8mb4 by issuing a SET NAMES 'utf8mb4' statement after connecting. The resulting collation is the 5.7 default utf8mb4 collation; that is, utf8mb4_general_ci. If the client additionally wants a collation of utf8mb4_0900_ai_ci, it cannot achieve that because the server does not recognize that collation. The client must either be willing to use a different utf8mb4 collation, or connect to a server from MySQL 8.0 or higher.