Search code examples
mysqlunicodeutf-8utf8mb4

Inserting 4-byte unicode characters into MySQL/MariaDB


When attempting to insert 💩 (for example, which is a 4-byte unicode char), both MySQL (5.7) and MariaDB (10.2/10.3/10.4) give the same error:

Incorrect string value: '\xF0\x9F\x92\xA9'

The statement:

mysql> insert into bob (test) values ('💩');

Here's my database's charset/collation:

mysql> select @@collation_database;                                                                     +----------------------+
| @@collation_database |
+----------------------+
| utf8mb4_unicode_ci   |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT @@character_set_database;                                                                 +--------------------------+
| @@character_set_database |
+--------------------------+
| utf8mb4                  |
+--------------------------+
1 row in set (0.00 sec)

The server's character set:

mysql> show global variables like '%character_set_server%'\G;                                           *************************** 1. row ***************************
Variable_name: character_set_server
        Value: utf8mb4

The table:

create table bob ( `test` TEXT NOT NULL );
mysql> SHOW FULL COLUMNS FROM bob;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test  | text | utf8mb4_unicode_ci | NO   |     | NULL    |       | select,insert,update,references |         |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)

Can anyone point me in the right direction?


Solution

  • Yes, as you commented, you need to use SET NAMES utf8mb4.

    Your 4-byte character must pass from your client through the database connection and into a table. All of those must support utf8mb4. If any one of them does not support utf8mb4, then 4-byte characters will not be able to get through.

    SET NAMES utf8mb4 makes the database session expect clients to send string using that encoding. The default for character_set_client on MySQL 5.7 is utf8, so you need to set it to utf8mb4.

    In MySQL 8.0.1 and later, the default character_set_client is utf8mb4 already, so you won't need to change it.