Search code examples
mysqlutf-8triggersutf8mb4

Trouble inserting 4-Byte UTF-8 characters / emoji into MySQL Database when insert trigger is active


I'm having trouble with a database trigger that logs all queries performed on a table, when that query contains a 4-Byte UTF-8 character.

My mysql.cnf

mysql version is 5.7.19 on Ubuntu 16.04

Example script:

show variables where Variable_name like 'character\_set\_%' or Variable_name like 'collation%';

drop database if exists my_test_db;
create database my_test_db;
use my_test_db;

create table my_test_table(id int not null primary key auto_increment, jdoc json not null);
create table my_test_table_log(id int not null primary key auto_increment, query varchar(1024) NOT NULL);

SELECT "insert works when trigger is not active" as "";
insert into my_test_table(jdoc) VALUES(JSON_OBJECT("Dubai was", "🔥"));

DELIMITER |
CREATE TRIGGER log_my_test_table_queries_insert
BEFORE INSERT ON `my_test_table`
FOR EACH ROW
  BEGIN
    DECLARE original_query VARCHAR(1024);
    SET original_query = (SELECT info
                          FROM INFORMATION_SCHEMA.PROCESSLIST
                          WHERE id = CONNECTION_ID());
    INSERT INTO `my_test_table_log` (`query`) VALUES (original_query);
  END;
|
DELIMITER ;

SELECT "insert doesn't work when trigger is active" as "";
insert into my_test_table(jdoc) VALUES(JSON_OBJECT("Dubai was", "🔥"));

My output:

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

Query OK, 2 rows affected (0.03 sec)

Query OK, 1 row affected (0.00 sec)

Database changed
Query OK, 0 rows affected (0.03 sec)

Query OK, 0 rows affected (0.02 sec)

+-----------------------------------------+
|                                         |
+-----------------------------------------+
| insert works when trigger is not active |
+-----------------------------------------+
1 row in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

+--------------------------------------------+
|                                            |
+--------------------------------------------+
| insert doesn't work when trigger is active |
+--------------------------------------------+
1 row in set (0.01 sec)

ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x94\xA5")...' for column 'INFO' at row 1

The Databases seem to be created with the correct character set

SHOW CREATE TRIGGER log_my_test_table_queries_insert;
SHOW CREATE TABLE my_test_table;
SHOW CREATE TABLE my_test_table_log;

outputs

detailsdetailsdetails


Solution

  • This happens because information_schema.processlist is using UTF8mb3 character set. (UTF is an alias to UTFmb3). This simple query illustrate this:

    mysql> select info, "🔥" from information_schema.processlist;
    +------------------------------------------------------+------+
    | info                                                 | ?    |
    +------------------------------------------------------+------+
    | select info, "?" from information_schema.processlist | 🔥     |
    +------------------------------------------------------+------+
    1 row in set, 1 warning (0,00 sec)
    
    Warning (Code 1366): Incorrect string value: '\xF0\x9F\x94\xA5" ...' for column 'INFO' at row 1
    

    The above warning probably result in an error when the trigger tries to insert the content of the info column into another table.

    I think the problem is that the query string is stored as is in the processlist table without converting it to the expected character set. I have filed a bug report about this.