Search code examples
djangomariadb

MariaDB CHARSET is set to utf8mb4, but insert emoji always raise Error 1366


First, check database CHARSET:

MariaDB [outdoors]> show create database outdoors;

| outdoors | CREATE DATABASE `outdoors` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci */ |

Then check the table CHARSET:

MariaDB [outdoors]> show create table backend_comment;

| backend_comment | CREATE TABLE `backend_comment` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `geo` varchar(10) DEFAULT NULL,
  `content` varchar(250) NOT NULL,
  `img` varchar(100) DEFAULT NULL,
  `comment_id` int(11) DEFAULT NULL,
  `create_time` datetime(6) NOT NULL,
  `notify_id` bigint(20) NOT NULL,
  `to_id` int(11) DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `backend_comment_notify_id_61aef760_fk_backend_notify_id` (`notify_id`),
  KEY `backend_comment_to_id_4dd23479_fk_auth_user_id` (`to_id`),
  KEY `backend_comment_user_id_1ab394ea_fk_auth_user_id` (`user_id`),
  CONSTRAINT `backend_comment_notify_id_61aef760_fk_backend_notify_id` FOREIGN KEY (`notify_id`) REFERENCES `backend_notify` (`id`),
  CONSTRAINT `backend_comment_to_id_4dd23479_fk_auth_user_id` FOREIGN KEY (`to_id`) REFERENCES `auth_user` (`id`),
  CONSTRAINT `backend_comment_user_id_1ab394ea_fk_auth_user_id` FOREIGN KEY (`user_id`) REFERENCES `auth_user` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |

Test insert emoji content:

MariaDB [outdoors]> insert into backend_comment 
        (id, content, notify_id, user_id, create_time) 
values (1, 'insert emoji test 😊', 1, 1, NOW());

ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x98\x8A' for column outdoors.backend_comment.content at row 1

**So what's the problem? help me please 😂**


----------
More information:

Server version: 10.11.8-MariaDB-0ubuntu0.24.04.1 Ubuntu 24.04

Solution

  • MariaDB and Mysql do not transform between different encodings automatically. When you try to insert data it will attempt to pass bytes from the client to the server and store as is. When there are no difference it works well, when there is a mismatch you can end up either with corrupted data, or an error.

    It looks like by default mariadb runs with utf8mb3, and it does not support all the emojis, and therefore when server interprets utf8mb3 codes as utf8mb4 it results in invalid ones.

    You could:

    Set charset per connection, for example in CLI:

    mariadb -u{user} --default-character-set=utf8mb4 -p
    

    or

    Set charset in the server configuration (see docs). Tho be ware what charset other connections can expect.