I'm using mariadb Server version: 10.3.37-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04 on a local machine My database 'nihongo' contains a 'radicalAlt' table which is:
mariadb -e "show full columns from nihongo.radicalAlt;"
+-------------+---------------------+--------------------+------+-----+---------+----------------+---------------------------------+----------------------------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------+---------------------+--------------------+------+-----+---------+----------------+---------------------------------+----------------------------------+
| id | tinyint(3) unsigned | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| radicalAlt | char(1) | utf8mb4_unicode_ci | YES | | NULL | | select,insert,update,references | |
| radical_id | int(10) unsigned | NULL | YES | | NULL | | select,insert,update,references | Clé externe pour `radical`.`pk` |
| strokecount | tinyint(3) unsigned | NULL | YES | | NULL | | select,insert,update,references | |
| writing | varchar(255) | utf8mb4_unicode_ci | YES | | NULL | | select,insert,update,references | |
| comment | varchar(255) | utf8mb4_unicode_ci | YES | | NULL | | select,insert,update,references | |
| unicode | char(8) | utf8mb4_unicode_ci | YES | UNI | NULL | | select,insert,update,references | |
+-------------+---------------------+--------------------+------+-----+---------+----------------+---------------------------------+----------------------------------+`
The table appears to contain duplicates:
`mariadb -e "use nihongo; select id, radicalAlt, count(*) as count from radicalAlt group by radicalAlt having count(*)>1;"
+----+------------+-------+
| id | radicalAlt | count |
+----+------------+-------+
| 16 | ⺌ | 2 |
| 24 | ⺝ | 2 |
... and so on...`
But when I select the first "duplicate", I notice there is no duplicate, but two distinct characters (⺌ and ⺍) with distinct unicode values, and that are well displayed in my linux gnome terminal. However, there a considered the same character in the query:
`mariadb -e "use nihongo; select * from radicalAlt where radicalAlt='⺌';"
+----+------------+------------+-------------+---------+---------+---------+
| id | radicalAlt | radical_id | strokecount | writing | comment | unicode |
+----+------------+------------+-------------+---------+---------+---------+
| 16 | ⺌ | 42 | 3 | NULL | NULL | 2E8C |
| 83 | ⺍ | 42 | 3 | NULL | NULL | 2E8D |
+----+------------+------------+-------------+---------+---------+---------+`
I ensured that character set and collations are consitent, database and table have the same character set and collation:
`mariadb -e "show variables like 'character%';"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+`
and:
`mariadb -e "show variables like 'collation%';"
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+`
here are my conf files:
/etc/mysql/mariadb.conf.d/50-server.cnf contains:
[...]
`character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
skip-character-set-client-handshake
[...]`
/etc/mysql/mariadb.conf.d/50-client.cnf contains:
`[...]
default-character-set = utf8mb4
init_command = "SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci"
[...]`
I understand that data are ok in the base, but I can't select them correctly. What am I doing wrong? What is the way to select correctly my chinese characters? Thank you in advance
In unicode_ci collation these characters might have the same weight (or are defined as ignorable), so you should use (less restrictive) general_ci collation instead:
# set unicode variables
set @a=cast(0xE2BA8C as char character set utf8mb4);
set @b=cast(0xE2BA8D as char character set utf8mb4);
select @a,@b;
+------+------+
| @a | @b |
+------+------+
| ⺌ | ⺍ |
+------+------+
# compare @a with @b, using different collations
select @a=@b collate utf8mb4_unicode_ci;
+----------------------------------+
| @a=@b collate utf8mb4_unicode_ci |
+----------------------------------+
| 1 |
+----------------------------------+
MariaDB []> select @a=@b collate utf8mb4_general_ci;
+----------------------------------+
| @a=@b collate utf8mb4_general_ci |
+----------------------------------+
| 0 |
+----------------------------------+